Wikitech labswiki https://wikitech.wikimedia.org/wiki/Main_Page MediaWiki 1.47.0-wmf.2 first-letter Media Special Talk User User talk Wikitech Wikitech talk File File talk MediaWiki MediaWiki talk Template Template talk Help Help talk Category Category talk Obsolete Obsolete talk OfficeIT OfficeIT talk Tool Tool talk Nova Resource Nova Resource Talk Heira Heira Talk TimedText TimedText talk Module Module talk Nova Resource:Tools/SAL 498 3086 2414287 2413972 2026-05-15T19:02:41Z Stashbot 7414 taavi: rebooting bastions and k8s workers to pick up kernel updates 2414287 wikitext text/x-wiki === 2026-05-15 === * 19:02 taavi: rebooting bastions and k8s workers to pick up kernel updates === 2026-05-14 === * 16:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 13:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 13:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component istio-gateway * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-05-13 === * 12:07 godog: resume restarting webservices using default memory requests - [[phab:T420565|T420565]] * 08:46 godog: restart sample webservices with new memory requests https://phabricator.wikimedia.org/P92497 - [[phab:T420565|T420565]] * 08:36 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 08:35 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 00:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 00:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-05-12 === * 23:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 23:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 22:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 22:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 22:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 21:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2026-05-11 === * 00:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component image-config * 00:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 00:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component image-config * 00:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config === 2026-05-07 === * 12:02 taavi: draining tools-k8s-worker-106 to investigate [[phab:T425172|T425172]] === 2026-05-05 === * 04:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 04:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 04:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 02:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 02:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 02:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 01:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-04-28 === * 10:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-04-23 === * 15:59 andrewbogott: hard rebooting tools-puppetserver-01.tools, it seems to have crashed * 09:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:12 taavi: uninstall ingress-nginx-gen2 from the cluster [[phab:T392356|T392356]] * 08:08 taavi: delete all ingress objects [[phab:T392356|T392356]] === 2026-04-21 === * 14:06 taavi: save backup of all ingress objects to ~taavi/ingresses-backup-2026-04-21.json [[phab:T392356|T392356]] * 13:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 13:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 12:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli === 2026-04-20 === * 15:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2026-04-16 === * 13:09 taavi: bump istio traffic percentage 75% -> 100% [[phab:T392356|T392356]] === 2026-04-15 === * 10:45 taavi: bump istio traffic percentage 50% -> 75% [[phab:T392356|T392356]] === 2026-04-13 === * 09:11 taavi: bump istio traffic percentage 25% -> 50% [[phab:T392356|T392356]] * 07:33 taavi: bump istio traffic percentage 10% -> 25% [[phab:T392356|T392356]] === 2026-04-10 === * 14:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 08:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 08:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 08:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-04-09 === * 14:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 06:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 06:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 06:24 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 06:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 06:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 06:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2026-04-08 === * 17:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 17:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 15:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 00:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api === 2026-04-07 === * 23:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 19:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 19:09 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:59 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 18:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 18:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 18:07 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:18 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:03 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:01 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:59 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:59 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=97) * 16:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:57 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:48 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 16:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=0) * 15:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 15:52 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 15:51 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:31 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 15:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 15:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:06 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 14:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 14:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 14:42 andrewbogott: replacing etcd nodes with bookworm-based VMs * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 13:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 12:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 09:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 08:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 08:57 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 08:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 08:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 07:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 07:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-04-02 === * 17:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 16:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 16:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 16:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 15:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 10:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 10:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-04-01 === * 18:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 18:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 12:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:57 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 09:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-03-31 === * 18:02 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 18:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 17:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 12:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 12:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-03-30 === * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 14:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:05 dcaro: removing wal from prometheus nodes to restart them === 2026-03-26 === * 17:30 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component wmcs-k8s-metrics * 17:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 14:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 10:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2026-03-25 === * 13:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-package-builder-04 * 13:43 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-package-builder-04 === 2026-03-24 === * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 17:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-23 === * 11:16 taavi: send 10% of traffic to istio [[phab:T392356|T392356]] * 10:53 taavi: send 5% of traffic to istio [[phab:T392356|T392356]] * 10:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-03-19 === * 20:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes * 17:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 16:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 16:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes * 14:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 11:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 09:07 taavi: fixing 2 tools still running ruby2.1 image to use that instead of 'ruby2' in service.manifest * 08:52 taavi: fixing 2 tools still running ruby2.5 image to use that instead of 'ruby25' in service.manifest * 08:49 taavi: fixing 12 tools still running node6 image to use that instead of 'nodejs' in service.manifest * 08:38 taavi: fixing 12 tools still running golang1.11 image to use that instead of 'golang111' in service.manifest * 08:36 taavi: fixing 60 tools still running python3.4 image to use 'python3.4' instead of 'python' in service.manifest === 2026-03-18 === * 12:00 taavi: restarting existing web services to backfill HTTPRoute resources [[phab:T392356|T392356]] * 07:37 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 07:37 filippo@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-83.tools.eqiad1.wikimedia.cloud to the cluster * 07:23 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T419824|T419824]]) === 2026-03-17 === * 12:43 taavi: shutdown tools-package-builder-04 [[phab:T401819|T401819]] === 2026-03-15 === * 03:10 andrewbogott: rebooting tools-redis-6, VM is in state ERROR === 2026-03-13 === * 22:04 taavi: reboot tools-bastion-15 [[phab:T420044|T420044]] * 19:06 taavi: reboot tools-bastion-14 [[phab:T420044|T420044]] === 2026-03-12 === * 13:55 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 13:50 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 === 2026-03-10 === * 11:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 11:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 09:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-09 === * 17:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 17:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 15:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 15:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:30 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-3.tools.eqiad1.wikimedia.cloud to the cluster * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster * 13:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:19 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-2.tools.eqiad1.wikimedia.cloud to the cluster * 13:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster * 13:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:07 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-1.tools.eqiad1.wikimedia.cloud to the cluster * 12:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster === 2026-03-06 === * 11:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 11:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 11:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-system * 11:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-system === 2026-03-05 === * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api === 2026-03-04 === * 20:10 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:58 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:57 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:46 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:46 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:45 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:44 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:30 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:29 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:14 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:14 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:13 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:13 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:12 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:12 root@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=97) * 19:11 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:11 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:10 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:10 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:09 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:08 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:07 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:07 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:06 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:06 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:05 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:04 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:03 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:03 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:02 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:02 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:00 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:00 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 18:59 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 18:58 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 18:57 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 18:17 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 18:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:57 root@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node * 17:38 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 17:18 root@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node * 16:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 14:54 dcaro: increase object quota to 400k ([[phab:T418528|T418528]]) * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 14:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 13:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 13:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api === 2026-03-03 === * 20:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 20:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-02 === * 17:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 17:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 16:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-02-26 === * 15:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component gateway-api * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component gateway-api === 2026-02-25 === * 14:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=0) for Istio 1.29.0 * 14:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 * 14:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=99) for Istio 1.29.0 * 14:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 * 14:09 taavi: taavi@tools-imagebuilder-2:~$ sudo docker system prune -a # reclaiming disk space * 14:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=99) for Istio 1.29.0 * 14:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 === 2026-02-24 === * 10:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 10:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 10:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-02-20 === * 20:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 19:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} === 2026-02-11 === * 21:49 taavi: remove hiera override still allowing ssh agent forwarding onto toolforge bastions [[phab:T198138|T198138]] === 2026-02-05 === * 19:08 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 18:48 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 16:42 volans: re-enabling puppet on NFS workers to update the infra-tracing-nfs === 2026-02-04 === * 15:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 15:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 14:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.14.3 * 14:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.14.3 === 2026-02-03 === * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 09:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 08:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.7 * 08:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.7 === 2026-01-28 === * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2026-01-23 === * 01:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 01:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-01-22 === * 18:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 18:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2026-01-15 === * 08:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-01-14 === * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions (exit_code=0) for tools-bastion-15.tools.eqiad1.wikimedia.cloud, tools-bastion-14.tools.eqiad1.wikimedia.cloud ([[phab:T413797|T413797]]) * 15:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions for tools-bastion-15.tools.eqiad1.wikimedia.cloud, tools-bastion-14.tools.eqiad1.wikimedia.cloud ([[phab:T413797|T413797]]) * 15:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 15:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 15:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses (exit_code=0) for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T413797|T413797]]) * 14:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T413797|T413797]]) * 14:58 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 14:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:43 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade to 1.31.14 ([[phab:T413797|T413797]]) * 13:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade to 1.31.14 ([[phab:T413797|T413797]]) === 2026-01-12 === * 17:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 17:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 17:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 17:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 17:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 17:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 16:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 16:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-01-06 === * 15:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 14:00 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 13:54 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 13:54 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 13:48 andrewbogott: removing tools-k8s-etcd-24 in prep for rebuilding cloudvirtlocal1003 * 13:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 03:28 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 03:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 03:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 02:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 02:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 02:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 02:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 02:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:59 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 01:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:48 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 01:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:42 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 01:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) === 2026-01-05 === * 23:17 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 23:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 23:10 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 23:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 23:01 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=97) * 22:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) === 2025-12-18 === * 11:13 godog: bump max objects quota to 200k * 11:05 godog: bump object quota to 500G === 2025-12-17 === * 17:54 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.6.3, Alloy 1.12.1 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.12.1 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.6.3 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.6.3, Alloy 1.12.1 ([[phab:T399313|T399313]]) === 2025-12-15 === * 13:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 13:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 13:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 13:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component wmcs-k8s-metrics * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 12:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T412695|T412695]]) * 12:01 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/kube-state-metrics:v2.17.0 ([[phab:T412695|T412695]]) * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T412695|T412695]]) === 2025-12-14 === * 02:14 andrewbogott: running 'kubectl rollout restart -n envvars-admission deployment/envvars-admission' in response to an envvars alert === 2025-12-11 === * 16:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-12-04 === * 21:37 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 21:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 21:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:52 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 20:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 20:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 20:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 20:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:03 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 19:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 19:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 19:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 19:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 19:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 19:13 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 19:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) === 2025-12-03 === * 19:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 19:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 17:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 17:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T375217|T375217]]) * 17:32 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 17:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T375217|T375217]]) === 2025-12-02 === * 20:22 andrewbogott: stop/starting harbordb1 to fix presumed mtu mismatch * 20:06 andrewbogott: rebooting tools-harbordb1 to aid with host draining * 08:31 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Alloy 1.11.3 ([[phab:T399313|T399313]]) * 08:30 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.11.3 ([[phab:T399313|T399313]]) * 08:30 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Alloy 1.11.3 ([[phab:T399313|T399313]]) === 2025-12-01 === * 22:31 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Alloy 1.4.0 ([[phab:T399313|T399313]]) * 22:30 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.4.0 ([[phab:T399313|T399313]]) * 22:30 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Alloy 1.4.0 ([[phab:T399313|T399313]]) * 16:46 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 16:26 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing === 2025-11-27 === * 12:15 volans: [continue] on the haproxy nodes * 12:15 volans: temporarily disabling puppet to deploy gerrit {{Gerrit|1211610}} === 2025-11-26 === * 14:48 volans: enabled infra-tracing-nfs on all nfs workers after testing it on few hosts * 09:46 dhinus: restarting tools-db-6 to apply a config change [[phab:T409922|T409922]] === 2025-11-25 === * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 02:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 02:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 01:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 01:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 01:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 01:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2025-11-24 === * 10:24 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 10:04 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing === 2025-11-20 === * 18:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34 * 18:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34 * 17:13 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 16:55 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 16:45 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 16:36 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:56 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 15:51 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:47 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:37 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:37 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:28 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:23 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 15:17 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:01 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 14:54 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 14:46 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 14:40 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-11-19 === * 16:19 andrewbogott: increased object count quota to 100,000 * 16:03 andrewbogott: increased object storage quota to 200GB * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-11-18 === * 18:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-harbor-2 (cluster eqiad1) * 14:36 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-harbor-2 (cluster eqiad1) * 14:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-prometheus-9 (cluster eqiad1) * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-prometheus-9 (cluster eqiad1) * 14:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-113 (cluster eqiad1) * 14:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-113 (cluster eqiad1) * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-113 * 14:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-113 * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-112 (cluster eqiad1) * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-112 (cluster eqiad1) * 14:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-112 * 14:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-112 * 14:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-82 (cluster eqiad1) * 14:28 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-82 (cluster eqiad1) * 14:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-82 * 14:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-82 * 14:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-81 (cluster eqiad1) * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-81 (cluster eqiad1) * 14:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-81 * 14:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-81 * 14:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-80 (cluster eqiad1) * 14:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-80 (cluster eqiad1) * 14:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-80 * 14:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-80 * 14:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-prometheus-8 (cluster eqiad1) * 14:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-prometheus-8 (cluster eqiad1) * 12:05 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:56 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:24 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:19 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-legacy-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-legacy-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.vps.instance.stop_start (exit_code=97) vm tools-legaci-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-legaci-redirector-3 (cluster eqiad1) === 2025-11-17 === * 18:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 18:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 18:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 18:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T409981|T409981]]) * 10:06 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T409981|T409981]]) === 2025-11-14 === * 16:27 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-4 ([[phab:T409287|T409287]]) * 16:26 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-4 ([[phab:T409287|T409287]]) === 2025-11-13 === * 15:13 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.instance.stop_start (exit_code=99) vm toolsbeta-test-k8s-ingress-12 (cluster eqiad1) * 15:13 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm toolsbeta-test-k8s-ingress-12 (cluster eqiad1) * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-bastion-14 (cluster eqiad1) * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-bastion-14 (cluster eqiad1) * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-haproxy-7 (cluster eqiad1) * 12:25 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-haproxy-7 (cluster eqiad1) * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-haproxy-8 (cluster eqiad1) * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-haproxy-8 (cluster eqiad1) === 2025-11-12 === * 15:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 15:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-11-11 === * 15:28 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.5.7 ([[phab:T399313|T399313]]) * 15:28 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.7 ([[phab:T399313|T399313]]) * 15:28 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.7 ([[phab:T399313|T399313]]) === 2025-11-10 === * 22:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 22:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:16 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest * 22:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:14 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest * 22:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:08 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest:latest * 22:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 21:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 21:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 21:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 21:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 21:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 20:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 20:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 20:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 19:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 19:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 19:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 19:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 19:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 19:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 19:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 19:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 19:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 19:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 19:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 18:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-11-07 === * 11:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 11:45 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 11:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T409287|T409287]]) * 11:35 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T409287|T409287]]) * 11:34 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-7 ([[phab:T409287|T409287]]) * 11:33 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-7 ([[phab:T409287|T409287]]) === 2025-11-06 === * 16:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-11-05 === * 19:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 18:40 taavi: taavi@tools-bastion-15:~ $ sudo loginctl terminate-user damian * 14:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 14:53 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 14:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T409287|T409287]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T409287|T409287]]) === 2025-11-04 === * 17:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 17:26 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 17:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 12:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 03:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 01:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-cli * 01:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli * 01:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-cli * 00:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli * 00:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 00:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 00:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 00:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 00:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 00:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 00:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli === 2025-11-03 === * 22:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 22:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 22:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 22:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 22:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 22:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 22:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 22:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 22:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 22:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 22:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 21:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 21:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 21:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 21:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 21:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 21:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 20:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 20:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 20:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 18:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 18:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 11:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 11:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 11:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2025-10-30 === * 18:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 11:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 11:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 11:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2025-10-29 === * 18:39 taavi: kick off script to rebuild all pre-built images, including [[phab:T407707|T407707]] * 16:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T408669|T408669]]) * 16:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T408669|T408669]]) * 16:27 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T408669|T408669]]) * 15:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T408669|T408669]]) * 14:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.calico.copy_images_to_registry (exit_code=0) for Calico v3.29.6 * 12:48 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/typha:v3.29.6 * 12:47 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/node:v3.29.6 * 12:47 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/kube-controllers:v3.29.6 * 12:46 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/ctl:v3.29.6 * 12:46 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/cni:v3.29.6 * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.calico.copy_images_to_registry for Calico v3.29.6 * 12:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 12:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 12:37 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 12:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico === 2025-10-28 === * 19:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:07 taavi: delete paws, paws-master security groups, long obsolete as paws is now in a separae project * 16:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 14:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 10:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-10-27 === * 22:09 taavi: copy toolviews database hiera data to a place where haproxy nodes can see them [[phab:T408454|T408454]] * 18:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:16 dcaro: removing taskruns/pipelineruns v1beta1 version from the stored list in the crds ([[phab:T408127|T408127]]) === 2025-10-24 === * 20:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-35, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-44, t * 18:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-35, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-44, tools-k8s-worker-nfs * 18:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-27 * 17:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-27 * 17:36 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-9, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-15, tools-k * 16:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-9, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-16, to * 16:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-105, tools-k8s-worker-106 * 16:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-105, tools-k8s-worker-106 * 16:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 16:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 16:19 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-102,tools-k8s-worker-103 * 16:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-102,tools-k8s-worker-103 * 13:37 andrewbogott: rebooting clouddumps100[12] for [[phab:T407110|T407110]] === 2025-10-23 === * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:11 taavi: deleting old nginx front proxy instances [[phab:T283948|T283948]] * 10:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-10-22 === * 15:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 15:56 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.3 * 15:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 15:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 12:35 taavi: moving toolforge traffic to haproxy directly [[phab:T283948|T283948]] * 07:00 godog: delete tools-nfs-2 - [[phab:T404584|T404584]] === 2025-10-21 === * 08:53 godog: shut down tools-nfs-2 - [[phab:T404584|T404584]] * 07:52 godog: tools-nfs-3 is back - [[phab:T404584|T404584]] * 07:49 godog: resize tools-nfs-3 to match tools-nfs-2 (g4.cores16.ram64.disk20.10xiops) - [[phab:T404584|T404584]] * 00:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 00:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-10-20 === * 16:31 taavi: make logrotate run hourly on haproxy nodes [[phab:T284558|T284558]] === 2025-10-16 === * 12:01 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:52 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 08:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-10-15 === * 08:03 godog: tools-nfs-3 is back - [[phab:T404584|T404584]] * 08:00 godog: resize tools-nfs-3 - [[phab:T404584|T404584]] === 2025-10-14 === * 14:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 11:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:45 godog: update nfs-tools.wmcloud.org and nfs.svc.toolforge.org proxied to point to tools-nfs-3 === 2025-10-13 === * 14:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:17 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-72, tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-76, tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-80, tools-k8s-worker-nfs-81, tools-k8s-worker-nfs-82, too * 09:14 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:14 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:14 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-81 (cluster eqiad1, project tools) * 09:14 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-81 (cluster eqiad1, project tools) * 09:13 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:13 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:09 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-69 (cluster eqiad1, project tools) * 09:09 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-69 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-68 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-68 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-67 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-67 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-66 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-66 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-65 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-65 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-61 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-61 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-58 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-58 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-57 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-57 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-55 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-55 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-54 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-54 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-53 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-53 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-50 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-50 (cluster eqiad1, project tools) * 09:03 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-5 (cluster eqiad1, project tools) * 09:03 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-5 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-48 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-48 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-47 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-47 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-46 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-46 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-45 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-45 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-44 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-44 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-43 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-43 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-42 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-42 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-41 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-41 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-40 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-40 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-39 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-39 (cluster eqiad1, project tools) * 08:59 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-38 (cluster eqiad1, project tools) * 08:59 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-38 (cluster eqiad1, project tools) * 08:58 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:58 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:57 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:57 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:10 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:05 wmbot~godog@r5: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:05 wmbot~godog@r5: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:04 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:01 godog: switch NFS from tools-nfs-2 to tools-nfs-3 - [[phab:T404584|T404584]] * 07:29 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-76, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-81 === 2025-10-10 === * 09:22 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-1 * 09:10 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-1 === 2025-10-09 === * 08:21 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.add_server (exit_code=99) * 08:15 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.add_server === 2025-10-08 === * 21:47 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-76 * 21:19 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-76 * 12:27 godog: very brief nfs interruption to wrap up [[phab:T347681|T347681]] * 10:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 08:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 06:55 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-71 * 06:43 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-71 === 2025-10-07 === * 18:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-11 * 16:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-11 * 15:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-69 * 15:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-69 * 14:51 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-65, tools-k8s-worker-nfs-69 * 14:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-65, tools-k8s-worker-nfs-69 * 13:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:59 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 12:58 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 * 12:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 11:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 10:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 08:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 08:08 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 08:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-10-06 === * 12:06 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-7 * 11:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-7 * 08:19 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 08:19 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 08:18 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-76 * 07:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-76 === 2025-10-03 === * 12:51 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 * 12:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-6 * 12:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-6 * 12:45 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 * 09:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-5 * 09:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-5 === 2025-10-02 === * 13:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 13:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 13:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-haproxy-7 * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-haproxy-7 * 13:23 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=99) * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 09:12 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 * 08:52 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 === 2025-10-01 === * 10:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-09-30 === * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 08:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 08:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 08:19 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-67 * 08:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-67 === 2025-09-29 === * 13:23 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 13:23 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 11:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:34 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:34 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 07:00 godog: kick stuck nfs workers from clouddumps1001 === 2025-09-28 === * 08:54 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:15 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:13 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:12 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T405850|T405850]]) * 08:10 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1 ([[phab:T405850|T405850]]) * 08:10 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:08 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:08 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:07 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:07 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-09-25 === * 18:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:27 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 12:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 12:04 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 11:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-09-24 === * 20:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 20:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 17:30 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-43 * 17:28 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-43 * 17:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:10 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-43 * 17:07 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-43 * 16:57 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73 ([[phab:T400957|T400957]]) * 16:50 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73 ([[phab:T400957|T400957]]) * 13:49 dcaro: patched all tools with new resource defaults, everything looks good * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:09 dcaro: depolyed jobs-api change to default resources, patching existing jobs * 13:08 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-cli * 13:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:36 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:28 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:11 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 03:54 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 * 03:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 === 2025-09-23 === * 20:08 andrewbogott: creating puppetdbpostgres and adding it to tools-puppetdb-2 to store postgres data; the root volume of that VM was filling up and causing widespread puppet issues * 01:55 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 01:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 === 2025-09-22 === * 16:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-09-21 === * 09:17 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-2 * 09:02 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-2 * 03:16 dcaro: acking and silencing CPU capacity alerts to handle on Monday, they should not page * 01:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 01:46 andrew@cloudcumin1001: Added a new k8s worker tools-k8s-worker-113.tools.eqiad1.wikimedia.cloud to the cluster * 01:36 andrewbogott: adding additional worker node in response to repeated capacity alerts * 01:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2025-09-19 === * 13:09 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-11 * 13:03 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-11 === 2025-09-18 === * 13:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:45 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-prometheus-9 (cluster eqiad1, project tools) * 11:29 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-prometheus-9 (cluster eqiad1, project tools) * 11:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:42 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:36 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 09:34 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 09:34 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 08:52 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-3 * 08:34 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-3 * 06:47 taavi: delete tools-sgebastion-10 [[phab:T314665|T314665]] === 2025-09-17 === * 13:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-32 * 12:53 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-32 * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:35 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-prometheus-9 (cluster eqiad1, project tools) * 09:35 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-prometheus-9 (cluster eqiad1, project tools) * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:34 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:23 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-10 * 08:08 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-10 === 2025-09-16 === * 16:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 16:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:57 taavi: delete tools-sgebastion puppet prefix [[phab:T314665|T314665]] * 15:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:26 taavi: shutdown tools-sgebastion-10 [[phab:T314665|T314665]] * 14:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-13 * 14:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-13 * 14:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 13:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-12 * 13:28 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-12 * 07:11 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75 * 06:57 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75 * 06:57 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 06:51 filippo@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-09-15 === * 16:22 taavi: reboot old bastions to kick long-living connections into newer ones * 14:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:09 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 14:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:47 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-66 * 12:35 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-66 === 2025-09-12 === * 08:49 taavi: pointing login.toolforge.org to tools-bastion-15 [[phab:T392510|T392510]] * 08:33 taavi: pointing dev.toolforge.org to tools-bastion-14 [[phab:T392510|T392510]] * 07:14 godog: uncordon tools-k8s-worker-nfs-53 after failed cookbook (?) yesterday === 2025-09-11 === * 14:42 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 14:36 godog: drain/reboot tools-k8s-worker-nfs-46 - [[phab:T404322|T404322]] * 14:36 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 14:22 andrewbogott: actually I didn't drain tools-k8s-worker-nfs-53 because the alert cleared on its own * 14:21 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-53 * 14:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 14:21 andrewbogott: draining/rebooting tools-k8s-worker-nfs-53 because of procs in D state * 13:42 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-53 * 13:36 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-14.tools.eqiad1.wikimedia.cloud * 07:59 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-14.tools.eqiad1.wikimedia.cloud * 07:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-15.tools.eqiad1.wikimedia.cloud * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-15.tools.eqiad1.wikimedia.cloud * 07:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase === 2025-09-10 === * 14:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 14:28 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 14:26 dcaro@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:10 dcaro@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:08 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:45 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:31 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:31 fnegri@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:30 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) === 2025-09-09 === * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 08:55 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) ([[phab:T404047|T404047]]) * 08:55 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot ([[phab:T404047|T404047]]) === 2025-09-08 === * 15:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 15:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 15:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:16 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.run_tests (exit_code=97) * 12:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 12:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 11:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 11:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions (exit_code=0) for tools-bastion-12.tools.eqiad1.wikimedia.cloud, tools-bastion-13.tools.eqiad1.wikimedia.cloud ([[phab:T402378|T402378]]) * 11:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions for tools-bastion-12.tools.eqiad1.wikimedia.cloud, tools-bastion-13.tools.eqiad1.wikimedia.cloud ([[phab:T402378|T402378]]) * 11:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses (exit_code=0) for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T402378|T402378]]) * 11:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T402378|T402378]]) * 11:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 10:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 10:20 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 10:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:06 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 10:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 ([[phab:T402378|T402378]]) * 10:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 10:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 09:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 ([[phab:T402378|T402378]]) * 09:58 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-68: ([[phab:T402378|T402378]]) * 09:58 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68: ([[phab:T402378|T402378]]) * 09:55 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:50 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:46 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 09:42 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:40 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=97) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-w * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:38 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:37 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 09:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 09:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 09:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 09:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 08:58 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:40 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:19 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:09 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:09 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.run_tests (exit_code=97) * 08:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests === 2025-09-06 === * 23:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 * 23:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 === 2025-09-05 === * 14:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-09-04 === * 19:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:52 dcaro: added 'disable-ssl' to tools replica.my.cnf === 2025-09-03 === * 17:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:02 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:09 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 13:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 12:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 12:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 11:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 11:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 10:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 08:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 08:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 08:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-09-02 === * 17:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 17:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 16:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 15:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-29 === * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance abogott-nstesting * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance abogott-nstesting === 2025-08-28 === * 16:52 taavi: rebuild tcl, mariadb images on top of trixie [[phab:T400256|T400256]] * 08:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 08:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-08-27 === * 18:05 taavi: copy missing aptly packages to trixie-<nowiki>{</nowiki>tools,toolsbeta<nowiki>}</nowiki> * 11:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-08-26 === * 13:42 dcaro: extended object storage quota to 100G ([[phab:T402923|T402923]]) * 10:25 dhinus: shut down tools-harbor-1 (no longer used) === 2025-08-25 === * 22:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-81 * 22:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-81 === 2025-08-21 === * 12:28 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 10:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:31 godog: reboot nfs workers to reset processes stuck in D state * 07:28 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 04:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-80 * 03:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-80 === 2025-08-20 === * 08:09 dcaro: deploy wmcs-k8s-metrics upgrade ([[phab:T362869|T362869]]) === 2025-08-19 === * 15:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 15:08 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-harbor * 15:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 15:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 14:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:50 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 14:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 14:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:46 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:42 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 14:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:37 dcaro: flipped the tools-harbor.wmcloud.org endpoint to point to tools-harbor-2 ([[phab:T350687|T350687]]) * 14:22 Raymond_Ndibe: setting tools-harbor-1 as read-only ([[phab:T350687|T350687]]) * 13:24 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 13:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:21 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 13:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-18 === * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362869|T362869]]) * 17:49 dcaro@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/kube-state-metrics:v2.16.0 ([[phab:T362869|T362869]]) * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/metrics-server:v0.7.2 ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362869|T362869]]) * 17:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud ([[phab:T350687|T350687]]) * 16:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud ([[phab:T350687|T350687]]) * 16:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:34 wmbot~dcaro@acme: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud === 2025-08-16 === * 21:16 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-111 * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-111 * 21:13 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-111 * 21:13 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-111 === 2025-08-15 === * 19:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 19:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 19:21 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 * 19:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 === 2025-08-14 === * 15:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:38 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 * 11:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 * 11:33 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 11:33 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 02:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 02:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-13 === * 16:36 dcaro: reverting jobs-api release ([[phab:T401846|T401846]]) * 11:18 taavi: delete tools-prometheus-6, shutdown for a while * 08:51 godog: bounce stashbot * 08:33 godog: refresh machine-id on tools-k8s-worker-[102-103,105-112].tools.eqiad1.wikimedia.cloud,tools-k8s-worker-nfs-[1-3,5,7-14,16-17,19,21-24,26-27,32-48,50,53-55 ,57-58,61,65-82].tools.eqiad1.wikimedia.cloud === 2025-08-12 === * 16:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 15:34 taavi: building initial trixie based images [[phab:T400255|T400255]] * 12:50 dcaro: redepoly kyverno ([[phab:T394787|T394787]]) * 12:49 dcaro: manually migrate cleanuppolicies.kyverno.io and clustercleanuppolicies.kyverno.io (using kyverno cli) ([[phab:T394787|T394787]]) * 10:01 dcaro: starting upgrade for kyverno ([[phab:T394787|T394787]]) * 10:00 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:53 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:52 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:52 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 03:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli === 2025-08-11 === * 12:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 12:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:44 dcaro@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 * 08:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 === 2025-08-08 === * 06:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 06:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-07 === * 14:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67 * 14:20 andrewbogott: draining and rebooting tools-k8s-worker-nfs-67 * 14:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 * 10:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-06 === * 17:54 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 17:53 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 17:41 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli ([[phab:T401014|T401014]]) * 17:41 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli ([[phab:T401014|T401014]]) * 17:39 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli ([[phab:T401014|T401014]]) * 17:38 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli ([[phab:T401014|T401014]]) * 17:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-05 === * 16:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 13:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 11:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-24 * 11:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-24 * 09:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 03:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 03:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 03:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 02:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 02:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 02:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 02:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 02:36 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 02:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:34 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 02:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 02:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 02:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 02:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 01:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 01:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 01:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 01:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 01:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 01:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 00:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-08-04 === * 13:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'filippo' in role 'member' ([[phab:T401091|T401091]]) * 13:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.add_user_to_project for user 'filippo' in role 'member' ([[phab:T401091|T401091]]) * 11:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-2 * 11:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-2 === 2025-08-01 === * 03:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli === 2025-07-31 === * 16:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 16:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 04:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 04:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 04:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli * 04:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 04:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 04:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2025-07-30 === * 08:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-07-29 === * 16:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 15:56 andrewbogott: draining and restarting tools-k8s-worker-nfs-74 * 15:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 * 15:44 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 * 15:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 * 15:32 andrewbogott: draining and restarting tools-k8s-worker-nfs-58 and tools-k8s-worker-nfs-32 * 14:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:16 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:16 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-82.tools.eqiad1.wikimedia.cloud to the cluster * 13:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:06 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 13:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:05 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-81.tools.eqiad1.wikimedia.cloud to the cluster * 12:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:53 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:46 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:40 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:40 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 12:40 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.quota_increase * 12:39 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:31 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:31 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-80.tools.eqiad1.wikimedia.cloud to the cluster * 12:22 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:18 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:00 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=97) * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:28 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:28 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:07 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:02 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:02 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:01 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 08:59 wmbot~dcaro@acme: Added a new k8s worker tools-k8s-worker-112.tools.eqiad1.wikimedia.cloud to the cluster * 08:49 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:49 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:49 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:14 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2025-07-28 === * 20:28 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:25 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:24 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:23 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:49 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 11:58 taavi: update pywikibot image to 10.2.0 [[phab:T396933|T396933]] === 2025-07-26 === * 07:16 wmbot~root@toolforge: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli * 07:16 wmbot~root@toolforge: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli === 2025-07-23 === * 18:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 18:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-07-21 === * 17:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 * 17:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 === 2025-07-19 === * 13:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 13:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 === 2025-07-18 === * 10:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-44 * 10:34 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-44 === 2025-07-14 === * 12:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-78 * 12:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-78 === 2025-07-13 === * 03:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 * 03:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 === 2025-07-11 === * 17:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-37 * 09:25 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-37 === 2025-07-09 === * 17:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 14:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 10:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 10:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 09:55 dcaro: adding arch arm64 to all toolforge repos ([[phab:T398016|T398016]]) * 09:40 dcaro: added arch arm64 to jessie-tools repo ([[phab:T398016|T398016]]) === 2025-07-08 === * 17:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 15:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:42 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-07 === * 17:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-53 * 17:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 16:39 dcaro: pushed new ci image docker-registry.svc.toolforge.org/cloud-cicd-py3.11-bookworm-tox:latest * 16:05 dcaro: clearing images from tools-imagebuilder-2 as it's out of space * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 11:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 08:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 08:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-06 === * 16:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-8 * 16:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-8 === 2025-07-05 === * 00:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-57 * 00:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-57 * 00:31 andrewbogott: restarting tools-k8s-worker-nfs-55 tools-k8s-worker-nfs-47 tools-k8s-worker-nfs-57, too many D state procs === 2025-07-04 === * 14:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-24 * 14:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-24 * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 === 2025-07-03 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 14:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 13:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 13:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 13:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 10:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 08:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 08:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 08:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-02 === * 13:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-55 * 13:30 andrewbogott: restarting stuck tools tools-k8s-worker-nfs-74 tools-k8s-worker-nfs-39 tools-k8s-worker-nfs-55 * 13:30 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-55 * 10:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 10:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 10:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 09:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:16 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 09:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-07-01 === * 16:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 15:23 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 15:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 15:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 15:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 14:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:31 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-5 ([[phab:T398170|T398170]]) * 14:30 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-5 ([[phab:T398170|T398170]]) * 14:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 14:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 13:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 13:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:03 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 11:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 11:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 10:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 10:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 09:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder === 2025-06-30 === * 23:01 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 * 22:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 * 13:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 * 13:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 * 10:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:47 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T398170|T398170]]) * 10:46 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T398170|T398170]]) * 10:46 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T398170|T398170]]) * 10:44 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:43 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) === 2025-06-28 === * 10:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67,tools-k8s-worker-nfs-43,tools-k8s-worker-nfs-22,tools-k8s-worker-nfs-5,tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67,tools-k8s-worker-nfs-43,tools-k8s-worker-nfs-22,tools-k8s-worker-nfs-5,tools-k8s-worker-nfs-24 * 10:12 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 * 10:08 dcaro: left a tmux running with a script to restart nginx if stuck * 09:59 dcaro: restarted nginx in tools-static === 2025-06-27 === * 18:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-46 * 17:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-46 === 2025-06-26 === * 16:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 16:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 12:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-06-25 === * 18:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 18:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:52 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:50 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 11:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 11:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 02:18 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 * 02:07 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 === 2025-06-24 === * 16:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-33 * 15:06 andrewbogott: rebooting tools-k8s-worker-nfs-33, stuck processes * 15:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33 * 15:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:22 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 12:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-06-23 === * 09:08 taavi: restrict logging in to tools-sgebastion-10 (aka login-buster) [[phab:T397459|T397459]] === 2025-06-22 === * 00:09 andrewbogott: rebooting tools-prometheus-8 === 2025-06-21 === * 16:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-12 * 15:58 andrewbogott: rebooting tools-k8s-worker-nfs-54 tools-k8s-worker-nfs-12, lots of D state * 15:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-12 * 10:09 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:27 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:27 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:26 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-06-19 === * 18:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 17:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:56 dcaro: reboot tools-sgebastion-10 as it's stuck on NFS for some tools === 2025-06-18 === * 14:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 14:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 04:22 andrewbogott: rebooting tools-prometheus-8; unreachable === 2025-06-16 === * 17:41 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 17:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 12:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 12:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 === 2025-06-14 === * 16:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 16:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 === 2025-06-12 === * 10:36 dcaro: rebooting tools-prometheus-8 due to the VM having load issues (not responding to ssh) * 10:34 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 10:28 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-06-11 === * 13:39 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 13:33 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.5.0, Alloy 1.9.1 * 11:18 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.9.1 * 11:18 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.0 * 11:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.0, Alloy 1.9.1 * 11:09 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=99) for Loki 3.5.0, Alloy 1.9.1 * 11:09 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.0 * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.0, Alloy 1.9.1 === 2025-06-10 === * 17:04 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 17:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 16:41 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 16:28 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 16:26 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 16:21 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 15:45 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:33 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:21 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 15:15 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:57 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 11:48 taavi: add AAAA records to tools/toolsbeta-harbor proxies, previous monitoring issues resolved === 2025-06-06 === * 21:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-74 * 21:40 andrewbogott: restarting tools-prometheus-9 and tools-prometheus-8, lots of tools metrics just went dark * 21:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-74 * 18:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 18:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 15:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2025-06-05 === * 22:24 andrewbogott: running /srv/tools/cleanup.sh on tools-nfs-2 in a screen session, trying to clear disk space alert * 15:06 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:53 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-05-30 === * 16:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 15:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-11 * 15:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-11 * 15:28 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component components-api * 15:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 07:38 taavi: reboot tools-static-15 to unstuck NFS things === 2025-05-24 === * 12:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-65 * 12:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-65 === 2025-05-23 === * 16:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 * 16:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 * 03:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 * 02:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 === 2025-05-22 === * 21:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 21:17 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-45, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-55 * 20:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-45, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-55 * 20:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 19:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 19:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-53, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-21 * 19:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 19:26 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 19:15 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-53, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-21 * 19:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 18:15 dcaro: restart tools-static nginx due to nfs hiccup * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-8 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-8 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-7 * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-7 * 07:58 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=1) for instance toolsbeta-prometheus-1 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance toolsbeta-prometheus-1 * 07:33 taavi: add AAAA record on *.toolforge.org [[phab:T211575|T211575]] === 2025-05-21 === * 15:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-10.tools.eqiad1.wikimedia.cloud * 15:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-9.tools.eqiad1.wikimedia.cloud * 15:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-10.tools.eqiad1.wikimedia.cloud * 15:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-9.tools.eqiad1.wikimedia.cloud * 13:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 13:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 09:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-9.tools.eqiad1.wikimedia.cloud * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-9.tools.eqiad1.wikimedia.cloud * 09:27 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/busybox:1.35 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/bitnami-kubectl:1.30.2 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-reports-controller:v1.13.6 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-background-controller:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyvernopre:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 * 09:25 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:04 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 09:03 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 08:54 dcaro: deployed the new dns entry for docker-registry.svc.toolforge.org (might take some time to refresh) * 08:47 dcaro: deleting docker-registry.svc.toolforge.org proxy to use dns entry to floating ip instead === 2025-05-20 === * 19:40 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 19:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 17:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 17:18 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 17:18 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 17:16 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 17:16 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 16:11 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 16:11 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 * 16:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 15:48 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 15:48 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 15:47 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 15:46 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports:v1.13.6 * 15:46 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup:v1.13.6 * 15:45 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background:v1.13.6 * 15:45 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 15:44 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 15:44 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 15:44 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 15:01 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 15:00 wmbot~dcaro@acme: Updating container image toolforge-kyverno-kyverno:v1.13.6 * 15:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 14:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 14:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:58 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=97) * 14:58 wmbot~dcaro@acme: Updating container image toolforge-kyverno-kyverno:v1.13.6 * 14:58 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 13:57 taavi: disable host-based authentication in sshd config, not used since grid shutdown * 13:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 * 13:07 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 * 13:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 * 13:05 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-8.tools.eqiad1.wikimedia.cloud * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-8.tools.eqiad1.wikimedia.cloud * 09:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase === 2025-05-19 === * 08:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 08:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2025-05-16 === * 18:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-9 * 17:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-9 * 16:51 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:44 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 16:44 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 16:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 16:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:08 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 12:07 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2025-05-14 === * 17:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 17:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 08:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2025-05-13 === * 15:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 15:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 07:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-36 * 07:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 === 2025-05-12 === * 19:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 16:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 13:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:23 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:04 arturo: add container image to docker registry docker-registry.tools.wmflabs.org/tofu-provisioning:20250512 ([[phab:T393686|T393686]]) * 11:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 11:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 11:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 09:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 09:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 08:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 08:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 02:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19 * 02:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19 === 2025-05-10 === * 17:35 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-sudo<nowiki>{</nowiki>,.socket<nowiki>}</nowiki> # looks like the reset-failed didnโ€™t work properly, systemd didnโ€™t even try to start the service again afaict ([[phab:T393732|T393732]]) * 17:34 lucaswerkmeister: root@tools-bastion-13:~# systemctl reset-failed sssd-<nowiki>{</nowiki>pam,sudo<nowiki>}</nowiki>.service && systemctl restart sssd-pam<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>.socket # try to reset the rate limits this way ([[phab:T393732|T393732]]) * 16:22 lucaswerkmeister: systemctl restart sssd-<nowiki>{</nowiki>pam<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>,sudo<nowiki>}</nowiki>.socket # service-start-limit-hit, [[phab:T393732|T393732]]? * 14:10 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # service-start-limit-hit, [[phab:T393732|T393732]]? * 11:53 lucaswerkmeister: [[phab:T393732|T393732]] note: restart of sssd-pam.service actually failed, โ€œmay be requested by dependency onlyโ€; overall it still seems to have worked though (so next time restarting the sockets is probably sufficient) * 11:52 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-pam<nowiki>{</nowiki>,<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>.socket<nowiki>}</nowiki> # all three failed with start-limit-hit / Start request repeated too quickly; [[phab:T393732|T393732]]? === 2025-05-09 === * 12:31 arturo: hard-reboot tools-bastion-13 (login.toolforge.org) because unresponsive (out of memory) -- previous reboot was for tools-bastion-12 (dev.t.o) by mistake * 12:29 arturo: hard-reboot tools-bastion-12 (login.toolforge.org) because unresponsive (out of memory) * 07:10 taavi: kill bunch of unwanted processes off of tools-bastion-13 [[phab:T393732|T393732]], please run your things as jobs === 2025-05-08 === * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 17:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 17:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 17:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 17:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 16:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 16:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 16:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:46 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-admission * 16:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:24 taavi: root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # was in failed state * 08:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-05-07 === * 18:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector-2 * 17:58 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector-2 * 16:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 12:58 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 12:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 12:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 11:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 10:36 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 10:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:40 taavi: remove 'roots' ldap sudo policy [[phab:T392797|T392797]] * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:33 dcaro: released jobs-cli 16.1.12 * 09:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 09:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-05-06 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 16:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 15:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:24 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 15:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 13:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:55 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 12:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69 * 12:10 dcaro: rebooting tools-k8s-worker-nfs-69 due to some stuck processes * 12:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69 === 2025-05-04 === * 11:12 dcaro: deleting tools-services-05, has been off for a year (replaced with 06) === 2025-05-02 === * 18:37 taavi: add elasticsearch credential for tools.techcontribs [[phab:T393209|T393209]] * 13:55 taavi: reboot tools-static-15 === 2025-04-28 === * 13:07 dhinus: tools-db-4: systemctl stop mariadb && systemctl start mariadb [[phab:T392596|T392596]] * 13:06 dhinus: tools-db-5: systemctl stop mariadb && systemctl start mariadb [[phab:T392596|T392596]] * 13:05 dhinus: tools-db-5: systemctl stop mariadb && systemctl start mariadb [[phab:T318479|T318479]] === 2025-04-24 === * 23:09 bd808: `systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl restart sssd` on tools-bastion-12 * 23:03 bd808: `sss_cache -E` on tools-bastion-12 after seeing "sudo: PAM account management error: Authentication service cannot retrieve authentication info" * 18:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 18:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 18:38 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli * 18:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 18:32 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli * 18:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 11:51 taavi: add missing ICMPv6 security group rule to 'default' group * 08:02 taavi: add an AAAA record for toolserver.org [[phab:T392506|T392506]] === 2025-04-23 === * 19:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 * 19:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 * 15:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-3.tools.eqiad1.wikimedia.cloud * 15:55 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-3.tools.eqiad1.wikimedia.cloud * 15:10 arturo: give `tools-tofu` bot account member powers for https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning * 13:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:02 taavi: rebooting tools-mail-4 with stuck NFS handles === 2025-04-21 === * 09:52 taavi: update pywikibot-scripts-stable image to v10.0.0 [[phab:T385400|T385400]] === 2025-04-17 === * 16:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 16:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder === 2025-04-16 === * 19:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 19:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 19:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 19:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 14:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-04-15 === * 13:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-11 === * 21:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 21:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-10 === * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 15:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 === 2025-04-09 === * 21:35 bd808: Removed rook and sstefanova from https://gitlab.wikimedia.org/groups/toolforge-repos/ owners (both offboarded former WMCS staff) * 10:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-08 === * 15:17 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 15:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 02:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 02:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-04-07 === * 19:26 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 19:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:48 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:33 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-109 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:32 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-109 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:11 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:10 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:08 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:08 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-79 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-79 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-78 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:06 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-78 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-77 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-77 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-76 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-76 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-75 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-75 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-74 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-74 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-73 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-73 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-72 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-72 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-71 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-71 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-70 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:54 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-70 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-69 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:51 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-69 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-68 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-111 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-68 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-67 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-111 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-110 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:48 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:48 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-67 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-110 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-66 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-66 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-65 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:45 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-65 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-104 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:40 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:38 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:37 fnegri@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:30 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:22 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:22 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:15 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 08:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 08:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 07:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 07:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 05:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 05:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-04-06 === * 02:12 andrewbogott: truncating large logfiles on tools nfs === 2025-04-04 === * 10:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 09:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 09:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:21 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 09:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 08:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 08:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 08:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 07:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 07:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 07:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 07:03 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 07:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 02:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes === 2025-04-03 === * 22:26 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes * 22:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 22:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 22:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 22:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 22:22 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-33 * 22:17 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 22:16 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33 * 22:13 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-71 * 22:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 22:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-74 * 22:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-71 * 21:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-74 * 21:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 21:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 * 20:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 20:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 08:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 * 08:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 === 2025-04-02 === * 20:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-55 * 20:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-55 * 12:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 * 12:37 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 === 2025-04-01 === * 14:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 13:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 * 13:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 13:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 13:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 * 13:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 === 2025-03-31 === * 12:48 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 12:42 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 12:03 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 11:58 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 * 09:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 08:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 === 2025-03-28 === * 16:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 16:40 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 13:58 taavi: reboot tools-static-15 due to stuck nginx worker processes * 10:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T389733|T389733]]) * 10:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T389733|T389733]]) * 09:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T389733|T389733]]) * 09:30 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T389733|T389733]]) === 2025-03-27 === * 17:34 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-33 * 17:26 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-33 * 17:26 root@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all NFS workers * 15:59 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:53 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all NFS workers * 15:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:02 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-111.tools.eqiad1.wikimedia.cloud to the cluster * 14:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:33 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 === 2025-03-25 === * 15:32 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:18 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2 * 13:58 andrewbogott: rebooting tools-k8s-worker-nfs-2 * 13:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2 * 10:32 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 10:32 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 08:39 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 08:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-03-24 === * 18:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 18:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 18:24 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 18:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 18:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 18:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 17:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:35 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 09:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 === 2025-03-22 === * 04:00 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 03:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 03:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 03:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 === 2025-03-20 === * 14:04 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'chuckonwumelu' in role 'member' * 14:04 aborrero@cloudcumin1001: START - Cookbook wmcs.vps.add_user_to_project for user 'chuckonwumelu' in role 'member' === 2025-03-18 === * 15:23 arturo: hard-reboot tools-prometheus-6, not responding to ssh * 10:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 10:30 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 10:03 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T383238|T383238]]) * 09:57 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T383238|T383238]]) === 2025-03-17 === * 19:01 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 19:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 18:42 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:41 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:37 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:36 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:32 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:32 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 14:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T388965|T388965]]) * 14:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T388965|T388965]]) === 2025-03-16 === * 11:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 === 2025-03-15 === * 15:31 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-16,tools-k8s-worker-nfs-34,tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16,tools-k8s-worker-nfs-34,tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 12:55 dcaro: there was an NFS hiccup that made the NFS checks fail for a second and some workers get stuck for a bit [[phab:T388965|T388965]] === 2025-03-13 === * 22:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 18:14 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T362868|T362868]]) * 18:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T362868|T362868]]) * 18:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api ([[phab:T362868|T362868]]) * 17:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api ([[phab:T362868|T362868]]) * 17:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission ([[phab:T362868|T362868]]) * 17:29 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission ([[phab:T362868|T362868]]) * 17:27 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission ([[phab:T362868|T362868]]) * 17:17 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission ([[phab:T362868|T362868]]) * 17:14 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api ([[phab:T362868|T362868]]) * 17:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362868|T362868]]) * 16:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission ([[phab:T362868|T362868]]) * 16:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission ([[phab:T362868|T362868]]) * 16:25 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission ([[phab:T362868|T362868]]) * 16:14 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission ([[phab:T362868|T362868]]) * 10:17 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 10:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 === 2025-03-12 === * 17:56 dhinus: aptly repo remove bookworm-tools helmfile, removing custom version that is older than the one from apt.w.o * 03:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-03-11 === * 17:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:31 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 14:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:58 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 10:46 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-03-10 === * 20:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 20:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 20:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 20:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 20:09 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 19:59 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:50 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 17:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder === 2025-03-07 === * 13:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 13:18 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2025-03-06 === * 13:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 12:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 12:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:15 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 12:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-03-05 === * 19:16 dhinus: systemctl restart prometheus@tools on tools-prometheus-7 (the two prom hosts are returning different values) * 17:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362868|T362868]]) * 17:44 fnegri@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.2 ([[phab:T362868|T362868]]) * 17:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362868|T362868]]) * 16:06 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 16:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:13 dcaro: restarting ingress pods due to ingress timing out sometimes * 08:09 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 08:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2025-03-04 === * 20:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:47 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:28 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 15:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362868|T362868]]) * 14:01 fnegri@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.12.0 ([[phab:T362868|T362868]]) * 14:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362868|T362868]]) * 13:51 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:40 dhinus: reboot tools-legacy-redirector-2 (http probes failing more than usual) * 12:50 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 12:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 10:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 09:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 09:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:58 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-03-03 === * 17:04 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:55 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:18 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 16:09 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 13:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 13:10 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 13:01 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 11:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 11:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-03-01 === * 19:08 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 19:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 16:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 * 16:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 * 15:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 === 2025-02-27 === * 16:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder === 2025-02-26 === * 14:22 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:05 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-02-25 === * 19:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 === 2025-02-24 === * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 21:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 21:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 20:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-02-21 === * 12:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 === 2025-02-20 === * 13:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer ([[phab:T320284|T320284]]) * 13:18 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer ([[phab:T320284|T320284]]) === 2025-02-19 === * 20:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 20:25 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 20:25 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 20:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 === 2025-02-18 === * 17:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 * 17:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 * 16:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 16:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 15:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 ([[phab:T380679|T380679]]) * 15:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 ([[phab:T380679|T380679]]) * 15:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 ([[phab:T380679|T380679]]) * 15:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 ([[phab:T380679|T380679]]) === 2025-02-17 === * 17:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 17:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2025-02-10 === * 12:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-02-09 === * 16:38 andrewbogott: rebooting tools-db-4 just in case that helps with the recurring DB crashes === 2025-02-07 === * 20:51 arturo: resize tools-legacy-redirector to have 2 vCPU [[phab:T385908|T385908]] * 17:58 andrewbogott: "SET GLOBAL read_only=OFF; " on tools-db-4; both -5 and -4 were set to read only. No idea why or how... * 01:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 01:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 * 01:28 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-07 * 01:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-07 * 01:27 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-07 * 01:27 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-07 === 2025-02-06 === * 17:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 17:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 15:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 14:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 14:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:06 andrewbogott: cold-migrating tools-proxy-8 for [[phab:T385264|T385264]]; will cause a brief toolforge outage * 14:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 14:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 13:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 13:06 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 13:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 12:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:37 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 12:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 12:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2025-02-03 === * 14:40 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-haproxy-5, tools-k8s-haproxy-6 * 14:40 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-haproxy-5, tools-k8s-haproxy-6 * 13:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-9, tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 * 13:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-9, tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 * 13:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 * 13:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 * 13:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-7 * 13:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 === 2025-02-01 === * 15:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-108 * 15:05 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-108 * 15:05 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-107 * 15:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-107 * 15:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-106 * 15:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-106 * 15:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-105 * 15:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-105 * 15:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 15:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 15:01 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-102 * 15:01 andrewbogott: rebooting all k8s (non-nfs) worker nodes for [[phab:T385264|T385264]] * 15:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-102 * 14:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 14:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 14:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 14:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 * 14:55 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-71 * 14:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-71 * 14:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-66 * 14:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-66 * 14:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 * 14:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 * 14:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 14:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 14:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 * 14:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 * 14:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 14:44 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 14:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 14:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 14:42 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 14:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 14:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-40 * 14:40 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-40 * 14:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 14:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 * 14:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-3 * 14:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-3 * 14:37 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32 * 14:36 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32 * 14:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 14:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 14:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 * 14:34 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 * 14:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 14:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 * 14:30 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 * 14:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 * 14:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 * 14:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-11 * 14:29 andrewbogott: rebooting all k8s-nfs worker nodes for [[phab:T385264|T385264]] * 14:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-11 * 14:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 * 14:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 * 14:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 === 2025-01-31 === * 11:04 dhinus: systemctl restart prometheus@tools on tools-prometheus-7 [[phab:T385262|T385262]] === 2025-01-29 === * 01:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-01-27 === * 16:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 15:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:52 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 13:52 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-01-26 === * 22:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 22:04 andrewbogott: restarting Node tools-k8s-worker-nfs-44 , too many D processes * 22:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 22:02 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-m8s-worker-nfs-44 * 22:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-m8s-worker-nfs-44 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud * 08:37 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:37 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-79.tools.eqiad1.wikimedia.cloud to the cluster * 08:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:26 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-78.tools.eqiad1.wikimedia.cloud to the cluster * 08:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:16 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-77.tools.eqiad1.wikimedia.cloud to the cluster * 08:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 08:06 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-110.tools.eqiad1.wikimedia.cloud to the cluster * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster ([[phab:T384790|T384790]]) * 07:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 07:56 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud to the cluster * 07:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster ([[phab:T384790|T384790]]) * 07:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 * 07:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 === 2025-01-24 === * 10:39 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 * 10:34 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 === 2025-01-23 === * 14:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:10 dcaro: reboot tools-static-15 due to nginx stuck on nfs === 2025-01-22 === * 17:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 17:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 === 2025-01-18 === * 15:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 === 2025-01-17 === * 15:52 dhinus: reboot tools-legacy-redirector-2 (http probes were failing) === 2025-01-15 === * 04:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 04:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 03:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-01-13 === * 21:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 ([[phab:T383625|T383625]]) * 21:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 ([[phab:T383625|T383625]]) * 21:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 ([[phab:T383625|T383625]]) * 21:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19 ([[phab:T383238|T383238]]) * 21:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 ([[phab:T383625|T383625]]) * 21:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 ([[phab:T383625|T383625]]) * 21:24 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19 ([[phab:T383238|T383238]]) * 21:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 ([[phab:T383625|T383625]]) * 21:19 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 21:18 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 21:18 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-21 ([[phab:T383238|T383238]]) * 21:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T383625|T383625]]) * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T383625|T383625]]) * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383238|T383238]]) * 21:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2 ([[phab:T383238|T383238]]) * 21:14 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-75 ([[phab:T383238|T383238]]) * 21:13 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T383238|T383238]]) * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 ([[phab:T383625|T383625]]) * 21:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2 ([[phab:T383238|T383238]]) * 21:08 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 21:05 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 ([[phab:T383625|T383625]]) * 21:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 21:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 ([[phab:T383238|T383238]]) * 20:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 ([[phab:T383238|T383238]]) * 20:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16 ([[phab:T383238|T383238]]) * 20:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 ([[phab:T383625|T383625]]) * 20:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16 ([[phab:T383238|T383238]]) * 20:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 20:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 ([[phab:T383625|T383625]]) * 20:49 dcaro: restart prometheus to pick up the new ips for vms and such * 20:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 20:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 20:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-8 * 20:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 20:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-20 ([[phab:T383625|T383625]]) * 20:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 ([[phab:T383625|T383625]]) * 20:42 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-20 ([[phab:T383238|T383238]]) * 20:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 ([[phab:T383238|T383238]]) * 20:42 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 20:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-8 * 20:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 20:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 20:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 20:36 lucaswerkmeister: restore root-owned /tmp/framer.txt on tools-sgebastion-10, tools-bastion-12, tools-bastion-13 (cf. 2025-01-05 log entry) following bastion reboots === 2025-01-12 === * 09:53 taavi: hard reboot tools-k8s-worker-nfs-55 === 2025-01-08 === * 18:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 ([[phab:T383238|T383238]]) * 18:34 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 ([[phab:T383238|T383238]]) * 18:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32 ([[phab:T383238|T383238]]) * 18:26 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32 ([[phab:T383238|T383238]]) * 18:19 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 ([[phab:T383238|T383238]]) * 18:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 ([[phab:T383238|T383238]]) * 18:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 18:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 18:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 ([[phab:T383238|T383238]]) * 18:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 ([[phab:T383238|T383238]]) * 18:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 ([[phab:T383238|T383238]]) * 18:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 ([[phab:T383238|T383238]]) * 18:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-8 ([[phab:T383238|T383238]]) * 17:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-8 ([[phab:T383238|T383238]]) * 17:59 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-27 ([[phab:T383238|T383238]]) * 17:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-27 ([[phab:T383238|T383238]]) * 17:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67 ([[phab:T383238|T383238]]) * 17:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 ([[phab:T383238|T383238]]) * 17:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 ([[phab:T383238|T383238]]) * 17:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 ([[phab:T383238|T383238]]) * 17:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-26 ([[phab:T383238|T383238]]) * 17:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-26 ([[phab:T383238|T383238]]) * 17:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 ([[phab:T383238|T383238]]) * 17:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 ([[phab:T383238|T383238]]) * 17:27 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 ([[phab:T383238|T383238]]) * 17:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 ([[phab:T383238|T383238]]) * 17:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 ([[phab:T383238|T383238]]) * 17:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 ([[phab:T383238|T383238]]) * 17:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 ([[phab:T383238|T383238]]) * 17:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 ([[phab:T383238|T383238]]) * 16:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 16:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 16:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-65 ([[phab:T383238|T383238]]) * 16:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-65 ([[phab:T383238|T383238]]) * 16:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 ([[phab:T383238|T383238]]) * 16:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 ([[phab:T383238|T383238]]) * 16:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 16:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 16:00 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 15:55 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 15:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 15:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 * 15:40 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 15:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 15:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-42 * 15:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-42 * 15:29 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22 * 15:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22 * 15:09 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 15:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 14:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 * 14:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 * 14:25 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-70 * 14:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-70 * 14:16 dcaro: reboot tools-static-15 nfs is stuck === 2025-01-07 === * 00:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 00:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 00:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:09 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor === 2025-01-06 === * 23:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 23:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 23:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 23:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 23:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 23:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 23:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 23:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 23:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 23:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 23:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 16:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-01-05 === * 18:58 lucaswerkmeister: remove /tmp/framer.txt on tools-bastion-13 (I notified the owner privately), and replace it with a root-owned file to prevent iTerm from leaking logs into it (https://iterm2.com/downloads/stable/iTerm2-3_5_11.changelog) on tools-sgebastion-10, tools-bastion-12 and tools-bastion-13 === 2025-01-03 === * 21:46 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69 * 21:41 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69 * 21:40 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-69 * 21:35 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-69 === 2025-01-02 === * 02:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 * 02:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 === 2025-01-01 === * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 21:05 andrewbogott: truncating *.err and *.out files to clear out NFS space * 21:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 * 21:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34 * 20:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34 === 2024-12-13 === * 14:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 14:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 14:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 14:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 09:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 09:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 * 09:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 09:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 08:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73 * 08:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73 === 2024-12-12 === * 10:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 10:47 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2024-12-06 === * 17:26 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-1 ([[phab:T352206|T352206]]) * 17:25 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-1 ([[phab:T352206|T352206]]) * 17:24 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-3 ([[phab:T352206|T352206]]) * 17:23 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-3 ([[phab:T352206|T352206]]) * 07:56 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 07:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-12-05 === * 16:34 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:42 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:06 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 13:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-12-04 === * 19:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 19:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 19:26 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 19:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 17:46 andrewbogott: rebooting tools-legacy-redirector-2, many probes failing * 17:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 17:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 17:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:54 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:47 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:45 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 15:45 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:26 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 15:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 15:11 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-api * 15:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 15:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 14:46 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:45 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 01:31 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:15 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:12 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-12-03 === * 22:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 22:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 21:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 21:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component main * 21:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component main === 2024-11-29 === * 03:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-27 === * 18:26 taavi: kubectl sudo rollout restart -n kube-system deployment coredns # update resolv.conf in coredns containers === 2024-11-26 === * 10:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-7 * 10:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:36 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:34 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:32 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-9 * 10:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-9 * 10:22 dcaro: rebooting k8s-control-9 * 10:18 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 * 10:17 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 * 10:17 dcaro: rebooting k8s-control-8 * 09:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 09:14 dcaro: restarting tools-k8s-worker-nfs-72 * 09:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 09:12 dcaro: restarting tools-k8s-worker-nfs-70 * 09:11 dcaro: restarting tools-k8s-worker-nfs-50 * 09:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 09:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 08:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 * 08:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 * 07:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers ([[phab:T380827|T380827]]) * 06:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers ([[phab:T380827|T380827]]) === 2024-11-25 === * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 12:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2024-11-23 === * 07:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder ([[phab:T358225|T358225]]) * 07:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder ([[phab:T358225|T358225]]) === 2024-11-20 === * 15:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 00:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission ([[phab:T362867|T362867]]) * 00:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission ([[phab:T362867|T362867]]) === 2024-11-19 === * 21:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 21:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 21:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 21:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 20:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 20:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 20:38 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-api ([[phab:T362867|T362867]]) * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api ([[phab:T362867|T362867]]) * 20:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T362867|T362867]]) * 20:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T362867|T362867]]) * 20:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 20:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 19:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission ([[phab:T362867|T362867]]) * 19:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission ([[phab:T362867|T362867]]) * 19:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission ([[phab:T362867|T362867]]) * 19:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission ([[phab:T362867|T362867]]) * 15:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-18 === * 14:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 14:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 14:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 14:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 11:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 11:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-11-15 === * 14:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:04 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T352206|T352206]]) * 13:50 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:49 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) === 2024-11-14 === * 13:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 13:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:04 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-12 === * 15:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 10:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-11 === * 16:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 15:58 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:44 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:42 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:37 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-11-10 === * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.11.0 ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362867|T362867]]) === 2024-11-06 === * 16:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 07:57 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 07:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 07:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-11-05 === * 17:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 08:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 08:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 07:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 07:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico === 2024-11-04 === * 16:39 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:30 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:22 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-69 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:52 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:04 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 12:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 12:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:11 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:59 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 11:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 11:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:56 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:42 dcaro: added api.svc.toolforge.org dns record entry * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 10:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:56 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:55 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:51 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-22 === * 13:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 * 12:58 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 09:05 arturo: restart puppetserver service for [[phab:T377803|T377803]] === 2024-10-16 === * 09:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-15 === * 17:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:16 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 16:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-14 === * 09:14 dcaro: migrating pipelineruns stored versions to v1 ([[phab:T376710|T376710]]) * 07:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 * 07:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-10-09 === * 09:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-08 === * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld ([[phab:T376710|T376710]]) * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld ([[phab:T376710|T376710]]) * 12:38 dcaro: tests are passing correctly, upgrade finished, will investigate the increased slowness as a followup * 12:27 dcaro: upgrade finished, build actions have become slower than usual ([[phab:T376710|T376710]]), running tests and investigating * 12:02 dcaro: starting toolforge builds-builder upgrade, no downtime expected though some builds might fail to start/list/log/show while the upgrade is in progress [[phab:T374908|T374908]] * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-04 === * 11:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 11:51 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 11:44 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-02 === * 09:11 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 09:07 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-01 === * 10:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 10:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 10:28 dcaro: updated ci image with latest precommit versions * 10:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:52 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 09:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-30 === * 18:25 taavi: run striker migrations [[phab:T359428|T359428]] === 2024-09-28 === * 00:14 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 00:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2024-09-27 === * 23:58 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 23:52 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-26 === * 16:45 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 16:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 16:08 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:05 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 15:58 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 10:20 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 10:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 10:05 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 07:53 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 07:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-25 === * 08:00 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 07:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 === 2024-09-24 === * 22:11 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T375157|T375157]]) * 22:03 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T375157|T375157]]) * 21:48 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno ([[phab:T359641|T359641]]) * 21:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component kyverno ([[phab:T359641|T359641]]) === 2024-09-20 === * 20:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 20:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 19:36 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 19:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 17:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:06 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/pod2daemon-flexvol:v3.28.2 ([[phab:T359641|T359641]]) * 17:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/typha:v3.28.2 ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/node:v3.28.2 ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/kube-controllers:v3.28.2 ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/ctl:v3.28.2 ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 06:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 00:39 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) * 00:32 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) === 2024-09-19 === * 23:17 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.10 ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 23:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.10.1 ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:38 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli ([[phab:T341066|T341066]]) * 17:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli ([[phab:T341066|T341066]]) * 17:13 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api ([[phab:T341066|T341066]]) * 17:06 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:48 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 16:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:45 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 16:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:38 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:10 dcaro: rebooting tools-k8s-worker-nfs-24 it's stuck without network * 16:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:08 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:07 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:28 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:19 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:08 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:01 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:57 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 14:56 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) === 2024-09-17 === * 08:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 03:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:13 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-64 * 03:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-63 * 03:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 03:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:07 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-76.tools.eqiad1.wikimedia.cloud to the cluster * 03:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 03:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:00 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud to the cluster * 02:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:46 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-74.tools.eqiad1.wikimedia.cloud to the cluster * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-62 * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-60 * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 02:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:38 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-73.tools.eqiad1.wikimedia.cloud to the cluster * 02:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-72.tools.eqiad1.wikimedia.cloud to the cluster * 02:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:24 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-71.tools.eqiad1.wikimedia.cloud to the cluster * 02:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:12 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-6 * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-56 * 02:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:08 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud to the cluster * 02:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 02:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-49 * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-31 * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:57 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-69.tools.eqiad1.wikimedia.cloud to the cluster * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-30 * 01:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-29 * 01:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 01:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-28 * 01:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:42 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-68.tools.eqiad1.wikimedia.cloud to the cluster * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 01:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-67.tools.eqiad1.wikimedia.cloud to the cluster * 01:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:23 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-66.tools.eqiad1.wikimedia.cloud to the cluster * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-49, tools-k8s-worker-nfs-50 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-36 ([[phab:T359641|T359641]]) === 2024-09-16 === * 17:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 17:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 17:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 17:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-09-13 === * 11:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 09:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) === 2024-09-12 === * 12:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:54 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) * 11:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) === 2024-09-11 === * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-09-09 === * 16:23 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component cert-manager * 16:16 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component cert-manager === 2024-09-06 === * 08:47 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:42 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 08:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 07:14 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/pause:3.6 * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-09-05 === * 13:50 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/stakater-reloader:v1.1.0 ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:46 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:28 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/cainjector:v1.15.3 ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/webhook:v1.15.3 ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/controller:v1.15.3 ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) === 2024-09-04 === * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:02 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 13:56 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 13:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 13:02 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-03 === * 20:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 19:53 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 19:48 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 19:36 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 19:29 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 15:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno * 15:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 15:29 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component kyverno * 15:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 14:41 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:55 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 11:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 * 05:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 === 2024-09-02 === * 14:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:30 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:25 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:02 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 13:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 12:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 11:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.25.16 to 1.26.15 * 11:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.25.16 to 1.26.15 * 10:05 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:58 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 09:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 09:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:48 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component components-api * 08:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-29 === * 16:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 07:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2024-08-27 === * 12:06 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:06 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.11.2 * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 09:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 wmbot~dcaro@urcuchillay: Added a new k8s worker tools-k8s-worker-108.tools.eqiad1.wikimedia.cloud to the cluster * 09:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 08:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 08:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 08:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 08:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:37 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:19 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-08-26 === * 21:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 21:13 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-64.tools.eqiad1.wikimedia.cloud to the cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-63.tools.eqiad1.wikimedia.cloud to the cluster * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 18:35 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-62.tools.eqiad1.wikimedia.cloud to the cluster * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-61.tools.eqiad1.wikimedia.cloud to the cluster * 16:54 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-60.tools.eqiad1.wikimedia.cloud to the cluster * 16:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-58.tools.eqiad1.wikimedia.cloud to the cluster * 16:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-57.tools.eqiad1.wikimedia.cloud to the cluster * 15:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:49 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:44 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:39 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:38 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 15:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:15 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 13:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 13:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:44 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 11:06 dcaro: manually deleted the coredns pods that had been around for 4d * 09:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 09:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 08:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 08:18 dcaro: scale up cordens deployment to 4 replicas === 2024-08-21 === * 05:44 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 05:38 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 05:27 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 05:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 04:55 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 04:43 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 04:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:28 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:25 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:22 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:21 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:20 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:10 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 04:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:49 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:42 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:28 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:19 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 03:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-08-19 === * 22:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 21:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 21:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 === 2024-08-15 === * 06:30 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-20 * 06:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 === 2024-08-13 === * 09:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 07:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-08-12 === * 15:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:51 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 11:46 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-08-08 === * 16:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 16:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 16:36 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 16:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 16:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-06 === * 09:50 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 09:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:50 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:19 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2024-08-05 === * 13:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 13:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 11:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-08-01 === * 20:42 bd808: Uncordoned tools-k8s-worker-nfs-55 following reboot * 20:40 bd808: Hard reboot of tools-k8s-worker-nfs-55 following drain cookbook run. Stuck pod remained stuck as expected. * 20:37 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-55 * 20:32 bd808: Draining and rebooting tools-k8s-worker-nfs-55 after reports of stuck pods via irc * 20:32 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 15:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 15:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api === 2024-07-31 === * 20:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 20:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 16:17 andrewbogott: changing login.tools.wmlabs.org to point to a newer bastion, tools-bastion-12, in response to [[phab:T371505|T371505]] * 11:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 11:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 11:33 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 10:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 * 09:49 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 === 2024-07-30 === * 18:08 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:06 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 18:01 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:40 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:39 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:37 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 17:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 16:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 === 2024-07-29 === * 18:24 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:23 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 18:06 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:05 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 16:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 14:05 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 14:03 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 13:19 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 13:18 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 12:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-cli * 12:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli * 12:01 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-cli * 12:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli === 2024-07-25 === * 15:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 15:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics === 2024-07-24 === * 09:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 09:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 08:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 08:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 07:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component ingress-admission * 06:57 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission === 2024-07-23 === * 15:04 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 15:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 13:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 12:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 12:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-22 === * 17:42 dcaro: moved the apt repo to service endpoint deb.svc.toolforge.org * 17:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-3 * 17:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-3 * 17:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 17:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 17:00 dcaro: moving the toolforge apt repo to tools-services-06 * 16:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-services-06.tools.eqiad1.wikimedia.cloud * 16:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-services-06.tools.eqiad1.wikimedia.cloud * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-19 === * 12:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:46 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.9.2 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 10:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 10:02 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.9.6 * 10:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-07-18 === * 14:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 14:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 08:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-17 === * 14:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 08:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-07-16 === * 15:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 11:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 10:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:57 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:53 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:27 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:20 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:18 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:17 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 10:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 10:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 09:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:07 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.24.17 to 1.25.16 * 09:06 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.24.17 to 1.25.16 * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-15 === * 14:42 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:40 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-11 === * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:49 dcaro: deploy toolforge-jobs-framework 16.0.13 ([[phab:T369573|T369573]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 11:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-10 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 16:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 16:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 15:16 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-09 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 14:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-07-08 === * 20:22 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 20:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-1 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-1 * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 13:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 12:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 12:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-07-05 === * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:27 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:27 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:26 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:23 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.7.0 * 12:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 11:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 01:47 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 01:46 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-04 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 12:57 arturo: updating kubelet flags [[phab:T355881|T355881]] * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 07:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-07-03 === * 12:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 10:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-02 === * 17:16 andrewbogott: draining (I hope) tools-elastic-3 and tools-elastic-1 for [[phab:T311905|T311905]] * 17:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 16:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 16:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:53 arturo: cleanup kubeadm configmap from TTLAfterFinished settings ([[phab:T349197|T349197]]) * 11:51 arturo: remove --feature-gates=TTLAfterFinished=true from kube-controller-manager static pod definition ([[phab:T349197|T349197]]) * 10:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 09:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 09:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-07-01 === * 15:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 14:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 14:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission === 2024-06-28 === * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-06-27 === * 16:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-23 * 16:44 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-23 * 16:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-1 * 16:21 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-db-1 * 15:49 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-3 * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-3 * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-24 * 15:37 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-24 * 15:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-22 * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-22 * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 14:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 14:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 11:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:02 arturo: drop all PSP definitions for all accounts ([[phab:T368142|T368142]]) * 10:02 arturo: disabled PodSecurityPolicy admission plugin from kubeadm configmap ([[phab:T368142|T368142]]) * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-26 === * 11:40 taavi: update pywikibot image to 9.2 [[phab:T363631|T363631]] * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:18 arturo: deploying toolforge-webservice 0.103.9 ([[phab:T368463|T368463]]) * 09:18 arturo: setting kyverno policies to Enforce ([[phab:T368141|T368141]]) * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 === 2024-06-25 === * 21:50 bd808: Live hacked /usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py on login-buster.toolforge.org to remove the `-> dict[str, Any]` type annotations causing [[phab:T368463|T368463]] * 12:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-104 * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-104 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-103 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-103 * 12:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-102 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-56 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-56 * 12:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-55 * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-55 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-54 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-54 * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-53 * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-53 * 12:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-52 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-52 * 12:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-51 * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-53 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-51 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-53 * 11:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-52 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-52 * 11:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-50 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-50 * 11:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-7 * 11:10 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-7 * 11:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.migrate_floating_ip (exit_code=0) for address 185.15.56.11 to server 'tools-proxy-8' * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.migrate_floating_ip for address 185.15.56.11 to server 'tools-proxy-8' * 09:44 arturo: deploy toolforge-webservice 0.103.8 ([[phab:T362050|T362050]]) * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-9 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-9 * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-9 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-9 * 08:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-49 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-49 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-47 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-47 * 08:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-45 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-47 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-47 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-45 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-44 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-46 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-46 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-44 * 08:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-43 * 08:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-42 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-44 * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-44 * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-42 * 08:13 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:07 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-41 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-41 * 08:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-39 * 07:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-39 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-38 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-38 * 07:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-37 * 07:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-37 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-36 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-36 * 07:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-35 * 07:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-35 * 07:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-34 * 07:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-34 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-35 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-33 * 07:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-35 * 07:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-33 * 07:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-33 * 07:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-33 === 2024-06-24 === * 20:56 andrewbogott: rebooting tools-k8s-worker-nfs-36; it has lots of stuck processes which somehow didn't get unstuck when we did the post-nfs-migration reboots. * 15:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-32 * 15:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-32 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-31 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-32 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-31 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-32 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-30 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-30 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-29 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-29 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-28 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-29 * 15:45 arturo: deploy toolforge-webservice 0.103.7 ([[phab:T362050|T362050]]) * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-29 * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-28 * 15:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-28 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-28 * 15:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-27 * 15:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-27 * 15:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-sgebastion-10 * 14:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-sgebastion-10 * 14:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-13 * 14:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-13 * 14:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 14:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-nfs-2 * 14:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs (exit_code=0) for server tbd * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs for server tbd * 13:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-26 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-24 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-26 * 13:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-24 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-24 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-24 * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-22 * 13:29 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-22 * 13:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-21 * 13:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-21 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-20 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-20 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-21 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-19 * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-21 * 13:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-19 * 13:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-20 * 13:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-17 * 13:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-20 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-16 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-16 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-15 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-15 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-14 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-14 * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-13 * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-13 * 12:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-12 * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-12 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-12 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-12 * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-7 * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-7 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-8 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-8 * 12:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-8 * 12:13 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-8 * 12:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-static-15 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-static-15 * 12:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-acme-chief-4 * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-acme-chief-4 * 12:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=97) for node tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-10 * 11:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-10 * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-9 * 11:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-9 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-8 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-8 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-7 * 11:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-7 * 11:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-4 * 11:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-4 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-4 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-3 * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-3 * 11:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-2 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-2 * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 10:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-5 * 10:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-5 * 10:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-7 * 10:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-7 * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-7 * 10:11 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-43 * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-7 * 10:09 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 10:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-7 * 10:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-7 * 10:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-7 * 10:03 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-43 * 10:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-7 * 10:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-6 * 09:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-6 * 09:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-cumin-1 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-cumin-1 * 09:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-5 * 09:50 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-5 * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-harbor-1 * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-harbor-1 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-107.tools.eqiad1.wikimedia.cloud to the cluster * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-6 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-6 * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetserver-01 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetserver-01 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetdb-2 * 09:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetdb-2 * 09:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:30 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-106.tools.eqiad1.wikimedia.cloud to the cluster * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-legacy-redirector-2 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-legacy-redirector-2 * 09:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-imagebuilder-2 * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-imagebuilder-2 * 09:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-services-05 * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-services-05 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-8 * 09:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-8 * 09:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-checker-5 * 09:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:18 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-105.tools.eqiad1.wikimedia.cloud to the cluster * 09:18 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-checker-5 * 09:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-06-20 === * 13:09 arturo: re-deploy kyverno [[phab:T368044|T368044]] * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-19 === * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:11 arturo: merging k8s HAproxy change https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047113 * 04:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 04:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 04:16 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 04:15 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-14 === * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 07:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 07:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-12 === * 19:41 bd808: Rebuilding all shared Docker containers. This will among other things apply the fix for [[phab:T367345|T367345]]. * 17:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 17:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 16:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 15:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 13:45 taavi: hard reboot tools-k8s-control-7 * 12:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-11 === * 17:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 16:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:50 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all NFS workers * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 11:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:57 dcaro: cleaning old maintain-kubeusers configmaps * 10:45 dcaro: cleaning up old resourcequotas === 2024-06-10 === * 09:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno === 2024-06-07 === * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:09 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-06 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-05 === * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:27 dcaro: deploying toolforge-webservice 0.103.6 * 12:58 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 08:44 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-13 * 08:41 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-12 === 2024-06-04 === * 16:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:47 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-03 === * 16:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:04 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:16 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:14 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 10:13 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 10:13 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:13 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:37 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:37 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 09:37 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:29 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-05-29 === * 16:14 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 02:59 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component envvars-api * 02:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-28 === * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-27 === * 15:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 09:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-25 === * 21:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 21:32 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 20:38 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 20:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-23 === * 13:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-05-22 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 16:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-15 === * 14:17 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-14 === * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 07:48 dcaro: draining tools-k8s-worker-nfs-9 as it's stuck on IO * 07:48 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-9 * 07:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 === 2024-05-07 === * 16:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-06 === * 12:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 12:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 08:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 07:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-05 === * 07:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 07:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-05-03 === * 15:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-30 === * 10:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-26 === * 08:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-25 === * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:48 taavi: update pywikibot script image to v9.1.0 [[phab:T363132|T363132]] === 2024-04-24 === * 15:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-18 === * 09:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-17 === * 20:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 20:48 andrewbogott: In response to stuck processes (NFS?), running sudo cookbook wmcs.toolforge.k8s.reboot --hostname-list tools-k8s-worker-nfs-50 --cluster-name tools * 20:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 15:21 dcaro: swapped login.toolforge.org to point to tools-bastion-13 * 10:48 dcaro: rebooting tools-k8s-worker-nfs-1 === 2024-04-16 === * 11:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.5.0' * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.5.0' === 2024-04-15 === * 20:34 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 20:33 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 18:28 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:27 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 13:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-12 === * 10:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 09:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 09:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 01:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 01:18 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 01:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 01:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 01:15 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 01:14 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 01:13 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 01:12 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 01:11 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-11 === * 08:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-09 === * 17:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 17:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:22 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro: deployed builds-builder 0.0.94 and removed builds-admission * 13:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 13:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 12:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:19 dcaro: deploying toolforge-jobs-cli 16.0.6 === 2024-04-08 === * 16:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:24 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:21 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:09 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 14:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 13:56 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:54 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:52 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 * 13:51 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:45 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:40 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:37 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:31 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:24 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:55 dcaro_: deploy toolforge-jobs-framework-cli 16.0.5 === 2024-04-05 === * 12:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-03 === * 15:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:00 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:59 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:58 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:58 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:57 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:57 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:37 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 11:24 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:24 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 09:45 taavi: rebuilding prebuild images for [[phab:T361457|T361457]] === 2024-04-02 === * 12:39 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-2 ([[phab:T344717|T344717]]) * 12:38 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-2 ([[phab:T344717|T344717]]) * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-05 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-05 === 2024-03-28 === * 14:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-05 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-05 * 13:45 taavi: migrating toolforge.org floating IP from tools-proxy-06 to tools-proxy-7 [[phab:T361223|T361223]] * 13:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:30 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-06 * 12:12 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-06 * 11:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 11:02 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' === 2024-03-27 === * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance toolserver-proxy-01 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance toolserver-proxy-01 === 2024-03-26 === * 16:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:41 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 16:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' * 12:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-bastion' * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-bastion' * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-sgebastion-11 * 12:43 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-sgebastion-11 * 10:24 taavi: point toolserver.org DNS to tools-legacy-redirector-2 [[phab:T311909|T311909]] === 2024-03-25 === * 18:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector * 18:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector * 14:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:27 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud === 2024-03-22 === * 11:43 dcaro: restarted sssd on tools-prometheus-6 as it was stopped (error) === 2024-03-21 === * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-4 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-4 * 15:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-3 * 15:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=99) for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 15:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 12:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 12:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node === 2024-03-20 === * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-checker-04 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-checker-04 * 12:30 taavi: move checker service address to tools-checker-5 * 11:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-checker' * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 10:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 10:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' === 2024-03-19 === * 21:28 taavi: kick off full container image rebuild for https://gerrit.wikimedia.org/r/1012753 (python3 backwards compat in lighttpd images) and https://gerrit.wikimedia.org/r/1010690 (add procps to base images) * 11:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-static-14 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-static-14 * 11:19 taavi: point dev.toolforge.org to tools-bastion-12 [[phab:T314665|T314665]] * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:38 dcaro: pushed docker-registry.tools.wmflabs.org/cloud-cicd-py311bookworm-tox:latest and docker-registry.tools.wmflabs.org/cloud-cicd-debian-builder-bookworm:2024-03-24.1 ([[phab:T360405|T360405]]) === 2024-03-18 === * 13:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 taavi: restart harbor services after docker service restart * 13:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:36 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-filesystemtest-1 * 12:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-filesystemtest-1 * 12:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:29 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:15 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:15 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:11 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:04 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 11:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:23 taavi: point tools-static proxy to tools-static-15 (bookworm) [[phab:T311913|T311913]] * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-api * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:04 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 10:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 09:27 taavi: deleted shutdown grid engine VMs [[phab:T314664|T314664]] === 2024-03-15 === * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-14 === * 17:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'misctools' version '1.48' * 17:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'misctools' version '1.48' * 15:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-imagebuilder-01 * 15:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:10 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 11:02 taavi: stop grid related VMs [[phab:T314664|T314664]] * 11:01 taavi: disable grid access for remaining tools still running on the grid [[phab:T314664|T314664]] === 2024-03-13 === * 19:21 andrewbogott: shutting down old puppet infra: tools-puppetmaster-02 and tools-puppetdb-1. These can be deleted in a week or two presuming everything remains stable. === 2024-03-12 === * 12:38 taavi: hard reboot tools-prometheus-6 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-11 === * 16:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 16:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:20 arturo: cached registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0 as docker-registry.tools.wmflabs.org/kube-state-metrics:v2.6.0 in the docker registry for [[phab:T359798|T359798]] === 2024-03-09 === * 12:48 taavi: hard reboot tools-sgebastion-10 due to stuck NFS procs === 2024-03-08 === * 12:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-07 === * 14:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 13:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-06 === * 10:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_grid_node (exit_code=1) for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:34 taavi: rebuilding all docker images for https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/1005952 ([[phab:T293552|T293552]]) + normal package updates * 09:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 09:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:42 taavi: reboot tools-sgeexec-10-20, -21, -23, sgeweblight-10-32 due to stuck nfs procs === 2024-03-05 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 16:07 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 16:06 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.openstack.quota_increase (exit_code=97) ([[phab:T357901|T357901]]) * 16:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 16:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud === 2024-03-04 === * 17:56 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 17:56 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:43 taavi: reboot tools-sgegrid-shadow due to high number of procs in D state === 2024-03-03 === * 10:38 dcaro: reboot tools-k8s-worker-nfs-55 got nfs lockup (logrotate in D state) === 2024-03-01 === * 21:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 21:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-02-29 === * 14:36 dcaro: deploy webservice 0.103.3 === 2024-02-28 === * 11:57 dcaro: deploy tools-webservice 0.103.2 with probes ([[phab:T341919|T341919]]) * 00:46 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 00:46 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-26 === * 09:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 09:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 09:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:35 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) === 2024-02-23 === * 14:19 taavi: remove isc-dhcp-server (server, not client) from tools-db-2 * 13:32 taavi: remove toolschecker alerts for grid engine jobs [[phab:T358333|T358333]] === 2024-02-22 === * 14:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 14:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:24 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:17 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:03 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 11:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 11:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 11:15 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-104.tools.eqiad1.wikimedia.cloud to the cluster * 11:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 10:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:39 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-8.tools.eqiad1.wikimedia.cloud to the cluster * 09:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 === 2024-02-21 === * 17:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:48 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-control-4 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-control-4 * 09:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:20 taavi@cloudcumin1001: Added a new k8s control tools-k8s-control-7.tools.eqiad1.wikimedia.cloud to the cluster * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster === 2024-02-20 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 16:12 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-103.tools.eqiad1.wikimedia.cloud to the cluster * 16:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 16:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 16:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-101 * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-101 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:48 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-102 * 15:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-102 * 15:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:38 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 15:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 12:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-56.tools.eqiad1.wikimedia.cloud to the cluster * 12:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-100 * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-100 * 12:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:40 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-55.tools.eqiad1.wikimedia.cloud to the cluster * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:29 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-54.tools.eqiad1.wikimedia.cloud to the cluster * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-98 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-98 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-53.tools.eqiad1.wikimedia.cloud to the cluster * 12:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-97 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-97 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-52.tools.eqiad1.wikimedia.cloud to the cluster * 11:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-96 * 11:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-96 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-50.tools.eqiad1.wikimedia.cloud to the cluster * 11:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-49.tools.eqiad1.wikimedia.cloud to the cluster * 11:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-95 * 11:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-95 * 10:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-48.tools.eqiad1.wikimedia.cloud to the cluster * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-92 * 10:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-92 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-6 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-6 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-47.tools.eqiad1.wikimedia.cloud to the cluster * 09:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 09:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-91 * 09:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-91 * 09:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:15 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-46.tools.eqiad1.wikimedia.cloud to the cluster * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:02 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-90 * 08:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-90 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-45.tools.eqiad1.wikimedia.cloud to the cluster * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-44.tools.eqiad1.wikimedia.cloud to the cluster * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-88 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-88 === 2024-02-19 === * 19:04 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 19:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-5 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-5 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-43.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-87 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-87 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-42.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-41.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-85 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-85 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-40.tools.eqiad1.wikimedia.cloud to the cluster * 12:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-84 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-84 * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:04 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-39.tools.eqiad1.wikimedia.cloud to the cluster * 11:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-83 * 11:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-83 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:50 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud to the cluster * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:39 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-37.tools.eqiad1.wikimedia.cloud to the cluster * 11:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-81 * 11:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-81 * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-16 === * 15:28 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 12:21 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-8.tools.eqiad1.wikimedia.cloud to the cluster * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 10:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 10:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:32 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:31 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:59 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-36.tools.eqiad1.wikimedia.cloud to the cluster * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-80 * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-80 * 09:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:45 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-35.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-79 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-79 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-34.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-33.tools.eqiad1.wikimedia.cloud to the cluster * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-77 * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-77 === 2024-02-15 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-4 * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-4 * 13:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:02 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-32.tools.eqiad1.wikimedia.cloud to the cluster * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-76 * 12:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-76 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-31.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-75 * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-75 * 11:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 11:37 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-7.tools.eqiad1.wikimedia.cloud to the cluster * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a ingress role in the tools cluster * 11:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster === 2024-02-14 === * 19:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-30 * 16:35 taavi: kill jobs user 'wikishizhao' is running directly on the grid per https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules #3 * 16:30 taavi: reboot tools-sgeexec-10-23 due to high load * 09:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:07 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-30.tools.eqiad1.wikimedia.cloud to the cluster * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-74 * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-74 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:54 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-29.tools.eqiad1.wikimedia.cloud to the cluster * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-28.tools.eqiad1.wikimedia.cloud to the cluster * 08:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-27.tools.eqiad1.wikimedia.cloud to the cluster * 08:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-71 * 08:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-71 * 08:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-26.tools.eqiad1.wikimedia.cloud to the cluster * 08:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-70 * 08:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-70 * 08:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud to the cluster * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-69 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-69 * 07:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 07:53 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-24.tools.eqiad1.wikimedia.cloud to the cluster * 07:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-68 * 07:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-68 === 2024-02-13 === * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-23.tools.eqiad1.wikimedia.cloud to the cluster * 15:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:30 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-22.tools.eqiad1.wikimedia.cloud to the cluster * 15:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-65 * 15:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-65 * 09:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-21.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-64 * 09:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-64 === 2024-02-12 === * 14:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:58 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-20.tools.eqiad1.wikimedia.cloud to the cluster * 14:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-19.tools.eqiad1.wikimedia.cloud to the cluster * 14:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-61 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-61 * 13:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-60 * 13:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-60 * 13:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-18.tools.eqiad1.wikimedia.cloud to the cluster * 13:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-58 * 13:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-58 * 13:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:22 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-17.tools.eqiad1.wikimedia.cloud to the cluster * 13:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-16.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-54 * 12:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-54 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-15.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-15 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-15 * 12:44 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-52 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-52 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 10:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-11 === * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-02-09 === * 18:03 andrewbogott: updated the default security group, removing the 0.0.0.0/0 rule allowing port 22 access everywhere, replaced it with a 172.16.0.0/21 rule * 13:06 taavi: reboot tools-sgecron-2 due to high load * 10:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component image-config * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component image-config * 09:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-14.tools.eqiad1.wikimedia.cloud to the cluster * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-50 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-50 * 08:56 dcaro: restart tools-k8s-worker-50 due to D some stuck processes === 2024-02-08 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-13.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-48 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-48 * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-12.tools.eqiad1.wikimedia.cloud to the cluster * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-11.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-45 * 09:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-45 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:10 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-10.tools.eqiad1.wikimedia.cloud to the cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-42 * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-42 === 2024-02-07 === * 21:33 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 18:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 17:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 17:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:03 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all workers * 17:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:01 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers === 2024-02-06 === * 13:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes ([[phab:T356507|T356507]]) * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes ([[phab:T356507|T356507]]) === 2024-01-31 === * 14:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-30 === * 19:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-9.tools.eqiad1.wikimedia.cloud to the cluster * 19:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 19:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-8.tools.eqiad1.wikimedia.cloud to the cluster * 19:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 19:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 18:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:46 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-7.tools.eqiad1.wikimedia.cloud to the cluster * 18:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-41 * 18:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-41 * 18:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-40 * 18:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-40 * 18:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-39 * 18:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-39 * 18:18 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-38 * 18:17 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-38 * 18:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-37 * 18:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-37 * 15:16 dcaro: restart harbor now that the db is clean ([[phab:T356037|T356037]]) * 15:14 dcaro: restart harbor now that the db is clean ([[phab:T3543|T3543]]) * 13:08 taavi: create no-op DMARC record [[phab:T354112|T354112]] * 12:39 dcaro: rebuilding all the toolforge images ([[phab:T354320|T354320]]) * 10:16 dcaro: restarting harbor and flushing redis to regenerate cache data ([[phab:T356037|T356037]]) * 09:33 dcaro: cleaning up old schedules on harbor ([[phab:T356037|T356037]]) === 2024-01-29 === * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 14:36 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-mail-4.tools.eqiad1.wikimedia.cloud * 14:34 wmbot~taavi@runko: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-mail-4.tools.eqiad1.wikimedia.cloud * 12:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:06 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-6.tools.eqiad1.wikimedia.cloud to the cluster * 11:55 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-5.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:22 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-4.tools.eqiad1.wikimedia.cloud to the cluster * 11:12 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:12 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-33 * 11:07 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-33 * 11:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-32 * 11:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-32 * 11:01 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-31 * 10:59 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-30 * 10:57 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 10:56 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-3.tools.eqiad1.wikimedia.cloud to the cluster * 10:46 blancadesal: increased harbor quota for wd-shex-infer to 2GiB * 10:44 blancadesal: increased harbor quota for lucaswerkmeister-test to 2GiB * 10:31 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-26 === * 10:56 taavi: copy helmfile_0.144.0-1_all to bookworm-tools, bookworm-toolsbeta === 2024-01-25 === * 13:17 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-24 === * 09:54 dcaro: deploy toolforge-jobs-framework-cli 16.0.1 === 2024-01-23 === * 19:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 19:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:31 taavi: rebooting tools-sgeexec-10-21, tools-sgeexec-10-22 * 12:58 dcaro: deployed toolforge-envvars-cli 0.0.4 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-19 === * 15:40 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-18 === * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-17 === 2024-01-17 === * 18:16 dhinus: increase volume quotas for toolsdb [[phab:T344717|T344717]] * 18:14 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) ([[phab:T344717|T344717]]) * 18:14 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T344717|T344717]]) * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:56 taavi: update all pre-built docker images [[phab:T352886|T352886]] === 2024-01-15 === * 09:18 taavi: reboot stuck tools-k8s-worker-84 === 2024-01-12 === * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-builds-cli' version '0.0.12' * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-builds-cli' version '0.0.12' === 2024-01-11 === * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 15:14 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:13 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-10 === * 22:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 22:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:17 taavi: reboot tools-k8s-worker-98 === 2024-01-09 === * 23:37 andrewbogott: restarting harbor-db in an attempt to reform harbor -- [[phab:T354714|T354714]] * 23:30 andrewbogott: rebooting tools-harbor-1 in a feeble attempt to get it to work (docker-compose can't restart it) * 23:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 23:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 23:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds.builder * 23:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds.builder * 17:31 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:13 taavi: reboot tools-sgeexec-10-17 due to high load === 2024-01-08 === * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-27, tools-sgeweblight-10-28 * 10:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 taavi: reboot tools-sgeexec-10-21 === 2024-01-05 === * 14:55 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:56 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:29 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:29 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-04 === * 10:11 dcaro: deploy toolforge-envvars-cli 0.0.3 === 2024-01-03 === * 21:22 andrewbogott: truncating 200 logfiles to 5M on tools nfs * 21:17 andrewbogott: deleting many stray core dumps throughout nfs storage === 2024-01-02 === * 11:06 dcaro: restart toolsdb database to flush connections ([[phab:T354176|T354176]]) * 10:42 dcaro: flushed the redis db on tools-harbor-1 ([[phab:T354176|T354176]]) * 10:37 dcaro: hard reboot tools-harbor-1 * 10:13 dhinus: hard reboot tools-harbor-1 === 2024-01-01 === * 15:55 andrewbogott: rebooting tools-harbor-1, [[phab:T354151|T354151]] ==Archives== * [[Nova Resource:Tools/SAL/Archive 1|Archive 1]] (2013-2014) * [[Nova Resource:Tools/SAL/Archive 2|Archive 2]] (2015-2017) * [[Nova Resource:Tools/SAL/Archive 3|Archive 3]] (2018-2019) * [[Nova Resource:Tools/SAL/Archive 4|Archive 4]] (2020-2021) * [[Nova Resource:Tools/SAL/Archive 5|Archive 5]] (2022-2023) </noinclude> {{SAL|Project Name=tools}} <noinclude>[[Category:SAL]]</noinclude> 64v5eyfh3d2quqi0z233tgxq73o0mi6 2414288 2414287 2026-05-15T19:09:26Z Stashbot 7414 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers 2414288 wikitext text/x-wiki === 2026-05-15 === * 19:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 19:02 taavi: rebooting bastions and k8s workers to pick up kernel updates === 2026-05-14 === * 16:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 13:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 13:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component istio-gateway * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-05-13 === * 12:07 godog: resume restarting webservices using default memory requests - [[phab:T420565|T420565]] * 08:46 godog: restart sample webservices with new memory requests https://phabricator.wikimedia.org/P92497 - [[phab:T420565|T420565]] * 08:36 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 08:35 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 00:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 00:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-05-12 === * 23:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 23:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 22:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 22:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 22:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 21:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2026-05-11 === * 00:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component image-config * 00:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 00:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component image-config * 00:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config === 2026-05-07 === * 12:02 taavi: draining tools-k8s-worker-106 to investigate [[phab:T425172|T425172]] === 2026-05-05 === * 04:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 04:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 04:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 02:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 02:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 02:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 01:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-04-28 === * 10:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-04-23 === * 15:59 andrewbogott: hard rebooting tools-puppetserver-01.tools, it seems to have crashed * 09:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:12 taavi: uninstall ingress-nginx-gen2 from the cluster [[phab:T392356|T392356]] * 08:08 taavi: delete all ingress objects [[phab:T392356|T392356]] === 2026-04-21 === * 14:06 taavi: save backup of all ingress objects to ~taavi/ingresses-backup-2026-04-21.json [[phab:T392356|T392356]] * 13:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 13:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 12:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli === 2026-04-20 === * 15:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2026-04-16 === * 13:09 taavi: bump istio traffic percentage 75% -> 100% [[phab:T392356|T392356]] === 2026-04-15 === * 10:45 taavi: bump istio traffic percentage 50% -> 75% [[phab:T392356|T392356]] === 2026-04-13 === * 09:11 taavi: bump istio traffic percentage 25% -> 50% [[phab:T392356|T392356]] * 07:33 taavi: bump istio traffic percentage 10% -> 25% [[phab:T392356|T392356]] === 2026-04-10 === * 14:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 08:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 08:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 08:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-04-09 === * 14:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 06:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 06:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 06:24 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 06:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 06:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 06:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2026-04-08 === * 17:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 17:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 15:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 00:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api === 2026-04-07 === * 23:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 19:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 19:09 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:59 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 18:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 18:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 18:07 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:18 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:03 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:01 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:59 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:59 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=97) * 16:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:57 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:48 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 16:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=0) * 15:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 15:52 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 15:51 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:31 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 15:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 15:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:06 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 14:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 14:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 14:42 andrewbogott: replacing etcd nodes with bookworm-based VMs * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 13:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 12:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 09:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 08:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 08:57 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 08:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 08:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 07:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 07:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-04-02 === * 17:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 16:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 16:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 16:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 15:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 10:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 10:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-04-01 === * 18:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 18:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 12:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:57 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 09:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-03-31 === * 18:02 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 18:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 17:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 12:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 12:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-03-30 === * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 14:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:05 dcaro: removing wal from prometheus nodes to restart them === 2026-03-26 === * 17:30 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component wmcs-k8s-metrics * 17:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 14:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 10:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2026-03-25 === * 13:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-package-builder-04 * 13:43 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-package-builder-04 === 2026-03-24 === * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 17:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-23 === * 11:16 taavi: send 10% of traffic to istio [[phab:T392356|T392356]] * 10:53 taavi: send 5% of traffic to istio [[phab:T392356|T392356]] * 10:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-03-19 === * 20:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes * 17:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 16:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 16:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes * 14:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 11:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 09:07 taavi: fixing 2 tools still running ruby2.1 image to use that instead of 'ruby2' in service.manifest * 08:52 taavi: fixing 2 tools still running ruby2.5 image to use that instead of 'ruby25' in service.manifest * 08:49 taavi: fixing 12 tools still running node6 image to use that instead of 'nodejs' in service.manifest * 08:38 taavi: fixing 12 tools still running golang1.11 image to use that instead of 'golang111' in service.manifest * 08:36 taavi: fixing 60 tools still running python3.4 image to use 'python3.4' instead of 'python' in service.manifest === 2026-03-18 === * 12:00 taavi: restarting existing web services to backfill HTTPRoute resources [[phab:T392356|T392356]] * 07:37 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 07:37 filippo@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-83.tools.eqiad1.wikimedia.cloud to the cluster * 07:23 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T419824|T419824]]) === 2026-03-17 === * 12:43 taavi: shutdown tools-package-builder-04 [[phab:T401819|T401819]] === 2026-03-15 === * 03:10 andrewbogott: rebooting tools-redis-6, VM is in state ERROR === 2026-03-13 === * 22:04 taavi: reboot tools-bastion-15 [[phab:T420044|T420044]] * 19:06 taavi: reboot tools-bastion-14 [[phab:T420044|T420044]] === 2026-03-12 === * 13:55 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 13:50 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 === 2026-03-10 === * 11:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 11:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 09:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-09 === * 17:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 17:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 15:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 15:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:30 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-3.tools.eqiad1.wikimedia.cloud to the cluster * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster * 13:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:19 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-2.tools.eqiad1.wikimedia.cloud to the cluster * 13:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster * 13:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:07 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-1.tools.eqiad1.wikimedia.cloud to the cluster * 12:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster === 2026-03-06 === * 11:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 11:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 11:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-system * 11:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-system === 2026-03-05 === * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api === 2026-03-04 === * 20:10 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:58 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:57 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:46 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:46 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:45 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:44 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:30 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:29 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:14 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:14 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:13 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:13 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:12 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:12 root@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=97) * 19:11 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:11 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:10 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:10 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:09 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:08 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:07 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:07 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:06 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:06 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:05 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:04 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:03 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:03 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:02 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:02 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:00 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:00 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 18:59 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 18:58 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 18:57 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 18:17 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 18:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:57 root@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node * 17:38 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 17:18 root@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node * 16:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 14:54 dcaro: increase object quota to 400k ([[phab:T418528|T418528]]) * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 14:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 13:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 13:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api === 2026-03-03 === * 20:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 20:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-02 === * 17:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 17:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 16:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-02-26 === * 15:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component gateway-api * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component gateway-api === 2026-02-25 === * 14:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=0) for Istio 1.29.0 * 14:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 * 14:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=99) for Istio 1.29.0 * 14:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 * 14:09 taavi: taavi@tools-imagebuilder-2:~$ sudo docker system prune -a # reclaiming disk space * 14:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=99) for Istio 1.29.0 * 14:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 === 2026-02-24 === * 10:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 10:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 10:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-02-20 === * 20:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 19:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} === 2026-02-11 === * 21:49 taavi: remove hiera override still allowing ssh agent forwarding onto toolforge bastions [[phab:T198138|T198138]] === 2026-02-05 === * 19:08 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 18:48 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 16:42 volans: re-enabling puppet on NFS workers to update the infra-tracing-nfs === 2026-02-04 === * 15:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 15:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 14:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.14.3 * 14:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.14.3 === 2026-02-03 === * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 09:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 08:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.7 * 08:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.7 === 2026-01-28 === * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2026-01-23 === * 01:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 01:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-01-22 === * 18:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 18:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2026-01-15 === * 08:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-01-14 === * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions (exit_code=0) for tools-bastion-15.tools.eqiad1.wikimedia.cloud, tools-bastion-14.tools.eqiad1.wikimedia.cloud ([[phab:T413797|T413797]]) * 15:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions for tools-bastion-15.tools.eqiad1.wikimedia.cloud, tools-bastion-14.tools.eqiad1.wikimedia.cloud ([[phab:T413797|T413797]]) * 15:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 15:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 15:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses (exit_code=0) for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T413797|T413797]]) * 14:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T413797|T413797]]) * 14:58 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 14:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:43 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade to 1.31.14 ([[phab:T413797|T413797]]) * 13:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade to 1.31.14 ([[phab:T413797|T413797]]) === 2026-01-12 === * 17:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 17:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 17:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 17:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 17:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 17:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 16:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 16:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-01-06 === * 15:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 14:00 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 13:54 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 13:54 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 13:48 andrewbogott: removing tools-k8s-etcd-24 in prep for rebuilding cloudvirtlocal1003 * 13:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 03:28 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 03:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 03:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 02:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 02:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 02:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 02:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 02:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:59 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 01:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:48 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 01:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:42 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 01:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) === 2026-01-05 === * 23:17 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 23:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 23:10 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 23:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 23:01 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=97) * 22:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) === 2025-12-18 === * 11:13 godog: bump max objects quota to 200k * 11:05 godog: bump object quota to 500G === 2025-12-17 === * 17:54 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.6.3, Alloy 1.12.1 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.12.1 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.6.3 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.6.3, Alloy 1.12.1 ([[phab:T399313|T399313]]) === 2025-12-15 === * 13:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 13:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 13:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 13:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component wmcs-k8s-metrics * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 12:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T412695|T412695]]) * 12:01 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/kube-state-metrics:v2.17.0 ([[phab:T412695|T412695]]) * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T412695|T412695]]) === 2025-12-14 === * 02:14 andrewbogott: running 'kubectl rollout restart -n envvars-admission deployment/envvars-admission' in response to an envvars alert === 2025-12-11 === * 16:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-12-04 === * 21:37 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 21:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 21:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:52 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 20:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 20:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 20:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 20:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:03 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 19:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 19:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 19:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 19:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 19:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 19:13 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 19:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) === 2025-12-03 === * 19:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 19:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 17:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 17:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T375217|T375217]]) * 17:32 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 17:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T375217|T375217]]) === 2025-12-02 === * 20:22 andrewbogott: stop/starting harbordb1 to fix presumed mtu mismatch * 20:06 andrewbogott: rebooting tools-harbordb1 to aid with host draining * 08:31 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Alloy 1.11.3 ([[phab:T399313|T399313]]) * 08:30 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.11.3 ([[phab:T399313|T399313]]) * 08:30 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Alloy 1.11.3 ([[phab:T399313|T399313]]) === 2025-12-01 === * 22:31 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Alloy 1.4.0 ([[phab:T399313|T399313]]) * 22:30 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.4.0 ([[phab:T399313|T399313]]) * 22:30 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Alloy 1.4.0 ([[phab:T399313|T399313]]) * 16:46 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 16:26 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing === 2025-11-27 === * 12:15 volans: [continue] on the haproxy nodes * 12:15 volans: temporarily disabling puppet to deploy gerrit {{Gerrit|1211610}} === 2025-11-26 === * 14:48 volans: enabled infra-tracing-nfs on all nfs workers after testing it on few hosts * 09:46 dhinus: restarting tools-db-6 to apply a config change [[phab:T409922|T409922]] === 2025-11-25 === * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 02:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 02:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 01:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 01:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 01:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 01:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2025-11-24 === * 10:24 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 10:04 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing === 2025-11-20 === * 18:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34 * 18:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34 * 17:13 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 16:55 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 16:45 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 16:36 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:56 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 15:51 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:47 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:37 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:37 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:28 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:23 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 15:17 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:01 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 14:54 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 14:46 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 14:40 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-11-19 === * 16:19 andrewbogott: increased object count quota to 100,000 * 16:03 andrewbogott: increased object storage quota to 200GB * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-11-18 === * 18:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-harbor-2 (cluster eqiad1) * 14:36 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-harbor-2 (cluster eqiad1) * 14:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-prometheus-9 (cluster eqiad1) * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-prometheus-9 (cluster eqiad1) * 14:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-113 (cluster eqiad1) * 14:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-113 (cluster eqiad1) * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-113 * 14:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-113 * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-112 (cluster eqiad1) * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-112 (cluster eqiad1) * 14:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-112 * 14:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-112 * 14:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-82 (cluster eqiad1) * 14:28 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-82 (cluster eqiad1) * 14:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-82 * 14:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-82 * 14:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-81 (cluster eqiad1) * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-81 (cluster eqiad1) * 14:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-81 * 14:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-81 * 14:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-80 (cluster eqiad1) * 14:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-80 (cluster eqiad1) * 14:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-80 * 14:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-80 * 14:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-prometheus-8 (cluster eqiad1) * 14:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-prometheus-8 (cluster eqiad1) * 12:05 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:56 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:24 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:19 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-legacy-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-legacy-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.vps.instance.stop_start (exit_code=97) vm tools-legaci-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-legaci-redirector-3 (cluster eqiad1) === 2025-11-17 === * 18:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 18:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 18:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 18:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T409981|T409981]]) * 10:06 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T409981|T409981]]) === 2025-11-14 === * 16:27 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-4 ([[phab:T409287|T409287]]) * 16:26 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-4 ([[phab:T409287|T409287]]) === 2025-11-13 === * 15:13 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.instance.stop_start (exit_code=99) vm toolsbeta-test-k8s-ingress-12 (cluster eqiad1) * 15:13 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm toolsbeta-test-k8s-ingress-12 (cluster eqiad1) * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-bastion-14 (cluster eqiad1) * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-bastion-14 (cluster eqiad1) * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-haproxy-7 (cluster eqiad1) * 12:25 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-haproxy-7 (cluster eqiad1) * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-haproxy-8 (cluster eqiad1) * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-haproxy-8 (cluster eqiad1) === 2025-11-12 === * 15:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 15:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-11-11 === * 15:28 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.5.7 ([[phab:T399313|T399313]]) * 15:28 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.7 ([[phab:T399313|T399313]]) * 15:28 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.7 ([[phab:T399313|T399313]]) === 2025-11-10 === * 22:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 22:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:16 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest * 22:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:14 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest * 22:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:08 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest:latest * 22:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 21:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 21:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 21:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 21:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 21:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 20:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 20:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 20:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 19:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 19:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 19:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 19:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 19:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 19:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 19:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 19:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 19:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 19:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 19:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 18:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-11-07 === * 11:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 11:45 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 11:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T409287|T409287]]) * 11:35 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T409287|T409287]]) * 11:34 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-7 ([[phab:T409287|T409287]]) * 11:33 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-7 ([[phab:T409287|T409287]]) === 2025-11-06 === * 16:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-11-05 === * 19:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 18:40 taavi: taavi@tools-bastion-15:~ $ sudo loginctl terminate-user damian * 14:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 14:53 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 14:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T409287|T409287]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T409287|T409287]]) === 2025-11-04 === * 17:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 17:26 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 17:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 12:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 03:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 01:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-cli * 01:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli * 01:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-cli * 00:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli * 00:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 00:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 00:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 00:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 00:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 00:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 00:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli === 2025-11-03 === * 22:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 22:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 22:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 22:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 22:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 22:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 22:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 22:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 22:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 22:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 22:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 21:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 21:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 21:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 21:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 21:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 21:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 20:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 20:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 20:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 18:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 18:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 11:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 11:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 11:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2025-10-30 === * 18:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 11:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 11:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 11:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2025-10-29 === * 18:39 taavi: kick off script to rebuild all pre-built images, including [[phab:T407707|T407707]] * 16:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T408669|T408669]]) * 16:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T408669|T408669]]) * 16:27 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T408669|T408669]]) * 15:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T408669|T408669]]) * 14:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.calico.copy_images_to_registry (exit_code=0) for Calico v3.29.6 * 12:48 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/typha:v3.29.6 * 12:47 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/node:v3.29.6 * 12:47 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/kube-controllers:v3.29.6 * 12:46 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/ctl:v3.29.6 * 12:46 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/cni:v3.29.6 * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.calico.copy_images_to_registry for Calico v3.29.6 * 12:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 12:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 12:37 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 12:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico === 2025-10-28 === * 19:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:07 taavi: delete paws, paws-master security groups, long obsolete as paws is now in a separae project * 16:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 14:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 10:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-10-27 === * 22:09 taavi: copy toolviews database hiera data to a place where haproxy nodes can see them [[phab:T408454|T408454]] * 18:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:16 dcaro: removing taskruns/pipelineruns v1beta1 version from the stored list in the crds ([[phab:T408127|T408127]]) === 2025-10-24 === * 20:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-35, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-44, t * 18:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-35, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-44, tools-k8s-worker-nfs * 18:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-27 * 17:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-27 * 17:36 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-9, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-15, tools-k * 16:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-9, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-16, to * 16:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-105, tools-k8s-worker-106 * 16:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-105, tools-k8s-worker-106 * 16:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 16:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 16:19 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-102,tools-k8s-worker-103 * 16:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-102,tools-k8s-worker-103 * 13:37 andrewbogott: rebooting clouddumps100[12] for [[phab:T407110|T407110]] === 2025-10-23 === * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:11 taavi: deleting old nginx front proxy instances [[phab:T283948|T283948]] * 10:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-10-22 === * 15:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 15:56 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.3 * 15:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 15:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 12:35 taavi: moving toolforge traffic to haproxy directly [[phab:T283948|T283948]] * 07:00 godog: delete tools-nfs-2 - [[phab:T404584|T404584]] === 2025-10-21 === * 08:53 godog: shut down tools-nfs-2 - [[phab:T404584|T404584]] * 07:52 godog: tools-nfs-3 is back - [[phab:T404584|T404584]] * 07:49 godog: resize tools-nfs-3 to match tools-nfs-2 (g4.cores16.ram64.disk20.10xiops) - [[phab:T404584|T404584]] * 00:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 00:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-10-20 === * 16:31 taavi: make logrotate run hourly on haproxy nodes [[phab:T284558|T284558]] === 2025-10-16 === * 12:01 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:52 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 08:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-10-15 === * 08:03 godog: tools-nfs-3 is back - [[phab:T404584|T404584]] * 08:00 godog: resize tools-nfs-3 - [[phab:T404584|T404584]] === 2025-10-14 === * 14:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 11:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:45 godog: update nfs-tools.wmcloud.org and nfs.svc.toolforge.org proxied to point to tools-nfs-3 === 2025-10-13 === * 14:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:17 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-72, tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-76, tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-80, tools-k8s-worker-nfs-81, tools-k8s-worker-nfs-82, too * 09:14 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:14 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:14 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-81 (cluster eqiad1, project tools) * 09:14 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-81 (cluster eqiad1, project tools) * 09:13 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:13 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:09 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-69 (cluster eqiad1, project tools) * 09:09 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-69 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-68 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-68 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-67 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-67 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-66 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-66 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-65 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-65 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-61 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-61 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-58 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-58 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-57 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-57 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-55 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-55 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-54 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-54 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-53 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-53 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-50 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-50 (cluster eqiad1, project tools) * 09:03 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-5 (cluster eqiad1, project tools) * 09:03 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-5 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-48 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-48 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-47 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-47 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-46 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-46 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-45 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-45 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-44 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-44 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-43 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-43 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-42 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-42 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-41 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-41 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-40 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-40 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-39 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-39 (cluster eqiad1, project tools) * 08:59 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-38 (cluster eqiad1, project tools) * 08:59 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-38 (cluster eqiad1, project tools) * 08:58 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:58 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:57 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:57 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:10 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:05 wmbot~godog@r5: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:05 wmbot~godog@r5: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:04 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:01 godog: switch NFS from tools-nfs-2 to tools-nfs-3 - [[phab:T404584|T404584]] * 07:29 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-76, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-81 === 2025-10-10 === * 09:22 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-1 * 09:10 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-1 === 2025-10-09 === * 08:21 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.add_server (exit_code=99) * 08:15 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.add_server === 2025-10-08 === * 21:47 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-76 * 21:19 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-76 * 12:27 godog: very brief nfs interruption to wrap up [[phab:T347681|T347681]] * 10:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 08:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 06:55 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-71 * 06:43 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-71 === 2025-10-07 === * 18:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-11 * 16:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-11 * 15:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-69 * 15:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-69 * 14:51 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-65, tools-k8s-worker-nfs-69 * 14:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-65, tools-k8s-worker-nfs-69 * 13:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:59 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 12:58 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 * 12:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 11:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 10:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 08:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 08:08 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 08:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-10-06 === * 12:06 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-7 * 11:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-7 * 08:19 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 08:19 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 08:18 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-76 * 07:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-76 === 2025-10-03 === * 12:51 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 * 12:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-6 * 12:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-6 * 12:45 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 * 09:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-5 * 09:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-5 === 2025-10-02 === * 13:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 13:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 13:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-haproxy-7 * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-haproxy-7 * 13:23 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=99) * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 09:12 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 * 08:52 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 === 2025-10-01 === * 10:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-09-30 === * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 08:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 08:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 08:19 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-67 * 08:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-67 === 2025-09-29 === * 13:23 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 13:23 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 11:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:34 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:34 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 07:00 godog: kick stuck nfs workers from clouddumps1001 === 2025-09-28 === * 08:54 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:15 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:13 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:12 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T405850|T405850]]) * 08:10 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1 ([[phab:T405850|T405850]]) * 08:10 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:08 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:08 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:07 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:07 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-09-25 === * 18:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:27 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 12:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 12:04 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 11:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-09-24 === * 20:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 20:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 17:30 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-43 * 17:28 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-43 * 17:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:10 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-43 * 17:07 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-43 * 16:57 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73 ([[phab:T400957|T400957]]) * 16:50 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73 ([[phab:T400957|T400957]]) * 13:49 dcaro: patched all tools with new resource defaults, everything looks good * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:09 dcaro: depolyed jobs-api change to default resources, patching existing jobs * 13:08 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-cli * 13:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:36 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:28 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:11 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 03:54 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 * 03:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 === 2025-09-23 === * 20:08 andrewbogott: creating puppetdbpostgres and adding it to tools-puppetdb-2 to store postgres data; the root volume of that VM was filling up and causing widespread puppet issues * 01:55 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 01:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 === 2025-09-22 === * 16:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-09-21 === * 09:17 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-2 * 09:02 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-2 * 03:16 dcaro: acking and silencing CPU capacity alerts to handle on Monday, they should not page * 01:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 01:46 andrew@cloudcumin1001: Added a new k8s worker tools-k8s-worker-113.tools.eqiad1.wikimedia.cloud to the cluster * 01:36 andrewbogott: adding additional worker node in response to repeated capacity alerts * 01:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2025-09-19 === * 13:09 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-11 * 13:03 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-11 === 2025-09-18 === * 13:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:45 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-prometheus-9 (cluster eqiad1, project tools) * 11:29 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-prometheus-9 (cluster eqiad1, project tools) * 11:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:42 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:36 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 09:34 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 09:34 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 08:52 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-3 * 08:34 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-3 * 06:47 taavi: delete tools-sgebastion-10 [[phab:T314665|T314665]] === 2025-09-17 === * 13:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-32 * 12:53 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-32 * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:35 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-prometheus-9 (cluster eqiad1, project tools) * 09:35 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-prometheus-9 (cluster eqiad1, project tools) * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:34 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:23 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-10 * 08:08 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-10 === 2025-09-16 === * 16:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 16:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:57 taavi: delete tools-sgebastion puppet prefix [[phab:T314665|T314665]] * 15:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:26 taavi: shutdown tools-sgebastion-10 [[phab:T314665|T314665]] * 14:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-13 * 14:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-13 * 14:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 13:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-12 * 13:28 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-12 * 07:11 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75 * 06:57 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75 * 06:57 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 06:51 filippo@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-09-15 === * 16:22 taavi: reboot old bastions to kick long-living connections into newer ones * 14:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:09 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 14:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:47 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-66 * 12:35 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-66 === 2025-09-12 === * 08:49 taavi: pointing login.toolforge.org to tools-bastion-15 [[phab:T392510|T392510]] * 08:33 taavi: pointing dev.toolforge.org to tools-bastion-14 [[phab:T392510|T392510]] * 07:14 godog: uncordon tools-k8s-worker-nfs-53 after failed cookbook (?) yesterday === 2025-09-11 === * 14:42 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 14:36 godog: drain/reboot tools-k8s-worker-nfs-46 - [[phab:T404322|T404322]] * 14:36 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 14:22 andrewbogott: actually I didn't drain tools-k8s-worker-nfs-53 because the alert cleared on its own * 14:21 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-53 * 14:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 14:21 andrewbogott: draining/rebooting tools-k8s-worker-nfs-53 because of procs in D state * 13:42 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-53 * 13:36 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-14.tools.eqiad1.wikimedia.cloud * 07:59 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-14.tools.eqiad1.wikimedia.cloud * 07:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-15.tools.eqiad1.wikimedia.cloud * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-15.tools.eqiad1.wikimedia.cloud * 07:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase === 2025-09-10 === * 14:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 14:28 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 14:26 dcaro@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:10 dcaro@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:08 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:45 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:31 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:31 fnegri@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:30 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) === 2025-09-09 === * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 08:55 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) ([[phab:T404047|T404047]]) * 08:55 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot ([[phab:T404047|T404047]]) === 2025-09-08 === * 15:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 15:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 15:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:16 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.run_tests (exit_code=97) * 12:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 12:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 11:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 11:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions (exit_code=0) for tools-bastion-12.tools.eqiad1.wikimedia.cloud, tools-bastion-13.tools.eqiad1.wikimedia.cloud ([[phab:T402378|T402378]]) * 11:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions for tools-bastion-12.tools.eqiad1.wikimedia.cloud, tools-bastion-13.tools.eqiad1.wikimedia.cloud ([[phab:T402378|T402378]]) * 11:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses (exit_code=0) for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T402378|T402378]]) * 11:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T402378|T402378]]) * 11:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 10:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 10:20 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 10:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:06 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 10:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 ([[phab:T402378|T402378]]) * 10:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 10:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 09:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 ([[phab:T402378|T402378]]) * 09:58 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-68: ([[phab:T402378|T402378]]) * 09:58 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68: ([[phab:T402378|T402378]]) * 09:55 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:50 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:46 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 09:42 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:40 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=97) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-w * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:38 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:37 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 09:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 09:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 09:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 09:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 08:58 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:40 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:19 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:09 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:09 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.run_tests (exit_code=97) * 08:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests === 2025-09-06 === * 23:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 * 23:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 === 2025-09-05 === * 14:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-09-04 === * 19:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:52 dcaro: added 'disable-ssl' to tools replica.my.cnf === 2025-09-03 === * 17:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:02 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:09 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 13:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 12:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 12:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 11:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 11:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 10:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 08:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 08:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 08:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-09-02 === * 17:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 17:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 16:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 15:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-29 === * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance abogott-nstesting * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance abogott-nstesting === 2025-08-28 === * 16:52 taavi: rebuild tcl, mariadb images on top of trixie [[phab:T400256|T400256]] * 08:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 08:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-08-27 === * 18:05 taavi: copy missing aptly packages to trixie-<nowiki>{</nowiki>tools,toolsbeta<nowiki>}</nowiki> * 11:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-08-26 === * 13:42 dcaro: extended object storage quota to 100G ([[phab:T402923|T402923]]) * 10:25 dhinus: shut down tools-harbor-1 (no longer used) === 2025-08-25 === * 22:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-81 * 22:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-81 === 2025-08-21 === * 12:28 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 10:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:31 godog: reboot nfs workers to reset processes stuck in D state * 07:28 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 04:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-80 * 03:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-80 === 2025-08-20 === * 08:09 dcaro: deploy wmcs-k8s-metrics upgrade ([[phab:T362869|T362869]]) === 2025-08-19 === * 15:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 15:08 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-harbor * 15:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 15:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 14:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:50 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 14:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 14:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:46 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:42 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 14:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:37 dcaro: flipped the tools-harbor.wmcloud.org endpoint to point to tools-harbor-2 ([[phab:T350687|T350687]]) * 14:22 Raymond_Ndibe: setting tools-harbor-1 as read-only ([[phab:T350687|T350687]]) * 13:24 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 13:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:21 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 13:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-18 === * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362869|T362869]]) * 17:49 dcaro@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/kube-state-metrics:v2.16.0 ([[phab:T362869|T362869]]) * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/metrics-server:v0.7.2 ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362869|T362869]]) * 17:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud ([[phab:T350687|T350687]]) * 16:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud ([[phab:T350687|T350687]]) * 16:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:34 wmbot~dcaro@acme: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud === 2025-08-16 === * 21:16 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-111 * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-111 * 21:13 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-111 * 21:13 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-111 === 2025-08-15 === * 19:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 19:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 19:21 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 * 19:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 === 2025-08-14 === * 15:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:38 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 * 11:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 * 11:33 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 11:33 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 02:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 02:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-13 === * 16:36 dcaro: reverting jobs-api release ([[phab:T401846|T401846]]) * 11:18 taavi: delete tools-prometheus-6, shutdown for a while * 08:51 godog: bounce stashbot * 08:33 godog: refresh machine-id on tools-k8s-worker-[102-103,105-112].tools.eqiad1.wikimedia.cloud,tools-k8s-worker-nfs-[1-3,5,7-14,16-17,19,21-24,26-27,32-48,50,53-55 ,57-58,61,65-82].tools.eqiad1.wikimedia.cloud === 2025-08-12 === * 16:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 15:34 taavi: building initial trixie based images [[phab:T400255|T400255]] * 12:50 dcaro: redepoly kyverno ([[phab:T394787|T394787]]) * 12:49 dcaro: manually migrate cleanuppolicies.kyverno.io and clustercleanuppolicies.kyverno.io (using kyverno cli) ([[phab:T394787|T394787]]) * 10:01 dcaro: starting upgrade for kyverno ([[phab:T394787|T394787]]) * 10:00 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:53 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:52 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:52 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 03:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli === 2025-08-11 === * 12:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 12:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:44 dcaro@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 * 08:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 === 2025-08-08 === * 06:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 06:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-07 === * 14:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67 * 14:20 andrewbogott: draining and rebooting tools-k8s-worker-nfs-67 * 14:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 * 10:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-06 === * 17:54 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 17:53 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 17:41 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli ([[phab:T401014|T401014]]) * 17:41 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli ([[phab:T401014|T401014]]) * 17:39 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli ([[phab:T401014|T401014]]) * 17:38 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli ([[phab:T401014|T401014]]) * 17:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-05 === * 16:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 13:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 11:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-24 * 11:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-24 * 09:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 03:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 03:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 03:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 02:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 02:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 02:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 02:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 02:36 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 02:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:34 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 02:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 02:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 02:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 02:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 01:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 01:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 01:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 01:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 01:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 01:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 00:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-08-04 === * 13:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'filippo' in role 'member' ([[phab:T401091|T401091]]) * 13:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.add_user_to_project for user 'filippo' in role 'member' ([[phab:T401091|T401091]]) * 11:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-2 * 11:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-2 === 2025-08-01 === * 03:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli === 2025-07-31 === * 16:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 16:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 04:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 04:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 04:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli * 04:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 04:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 04:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2025-07-30 === * 08:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-07-29 === * 16:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 15:56 andrewbogott: draining and restarting tools-k8s-worker-nfs-74 * 15:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 * 15:44 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 * 15:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 * 15:32 andrewbogott: draining and restarting tools-k8s-worker-nfs-58 and tools-k8s-worker-nfs-32 * 14:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:16 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:16 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-82.tools.eqiad1.wikimedia.cloud to the cluster * 13:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:06 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 13:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:05 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-81.tools.eqiad1.wikimedia.cloud to the cluster * 12:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:53 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:46 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:40 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:40 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 12:40 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.quota_increase * 12:39 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:31 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:31 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-80.tools.eqiad1.wikimedia.cloud to the cluster * 12:22 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:18 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:00 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=97) * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:28 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:28 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:07 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:02 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:02 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:01 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 08:59 wmbot~dcaro@acme: Added a new k8s worker tools-k8s-worker-112.tools.eqiad1.wikimedia.cloud to the cluster * 08:49 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:49 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:49 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:14 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2025-07-28 === * 20:28 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:25 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:24 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:23 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:49 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 11:58 taavi: update pywikibot image to 10.2.0 [[phab:T396933|T396933]] === 2025-07-26 === * 07:16 wmbot~root@toolforge: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli * 07:16 wmbot~root@toolforge: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli === 2025-07-23 === * 18:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 18:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-07-21 === * 17:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 * 17:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 === 2025-07-19 === * 13:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 13:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 === 2025-07-18 === * 10:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-44 * 10:34 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-44 === 2025-07-14 === * 12:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-78 * 12:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-78 === 2025-07-13 === * 03:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 * 03:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 === 2025-07-11 === * 17:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-37 * 09:25 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-37 === 2025-07-09 === * 17:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 14:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 10:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 10:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 09:55 dcaro: adding arch arm64 to all toolforge repos ([[phab:T398016|T398016]]) * 09:40 dcaro: added arch arm64 to jessie-tools repo ([[phab:T398016|T398016]]) === 2025-07-08 === * 17:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 15:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:42 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-07 === * 17:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-53 * 17:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 16:39 dcaro: pushed new ci image docker-registry.svc.toolforge.org/cloud-cicd-py3.11-bookworm-tox:latest * 16:05 dcaro: clearing images from tools-imagebuilder-2 as it's out of space * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 11:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 08:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 08:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-06 === * 16:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-8 * 16:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-8 === 2025-07-05 === * 00:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-57 * 00:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-57 * 00:31 andrewbogott: restarting tools-k8s-worker-nfs-55 tools-k8s-worker-nfs-47 tools-k8s-worker-nfs-57, too many D state procs === 2025-07-04 === * 14:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-24 * 14:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-24 * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 === 2025-07-03 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 14:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 13:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 13:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 13:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 10:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 08:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 08:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 08:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-02 === * 13:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-55 * 13:30 andrewbogott: restarting stuck tools tools-k8s-worker-nfs-74 tools-k8s-worker-nfs-39 tools-k8s-worker-nfs-55 * 13:30 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-55 * 10:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 10:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 10:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 09:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:16 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 09:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-07-01 === * 16:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 15:23 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 15:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 15:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 15:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 14:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:31 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-5 ([[phab:T398170|T398170]]) * 14:30 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-5 ([[phab:T398170|T398170]]) * 14:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 14:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 13:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 13:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:03 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 11:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 11:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 10:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 10:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 09:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder === 2025-06-30 === * 23:01 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 * 22:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 * 13:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 * 13:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 * 10:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:47 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T398170|T398170]]) * 10:46 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T398170|T398170]]) * 10:46 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T398170|T398170]]) * 10:44 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:43 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) === 2025-06-28 === * 10:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67,tools-k8s-worker-nfs-43,tools-k8s-worker-nfs-22,tools-k8s-worker-nfs-5,tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67,tools-k8s-worker-nfs-43,tools-k8s-worker-nfs-22,tools-k8s-worker-nfs-5,tools-k8s-worker-nfs-24 * 10:12 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 * 10:08 dcaro: left a tmux running with a script to restart nginx if stuck * 09:59 dcaro: restarted nginx in tools-static === 2025-06-27 === * 18:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-46 * 17:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-46 === 2025-06-26 === * 16:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 16:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 12:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-06-25 === * 18:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 18:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:52 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:50 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 11:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 11:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 02:18 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 * 02:07 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 === 2025-06-24 === * 16:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-33 * 15:06 andrewbogott: rebooting tools-k8s-worker-nfs-33, stuck processes * 15:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33 * 15:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:22 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 12:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-06-23 === * 09:08 taavi: restrict logging in to tools-sgebastion-10 (aka login-buster) [[phab:T397459|T397459]] === 2025-06-22 === * 00:09 andrewbogott: rebooting tools-prometheus-8 === 2025-06-21 === * 16:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-12 * 15:58 andrewbogott: rebooting tools-k8s-worker-nfs-54 tools-k8s-worker-nfs-12, lots of D state * 15:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-12 * 10:09 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:27 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:27 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:26 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-06-19 === * 18:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 17:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:56 dcaro: reboot tools-sgebastion-10 as it's stuck on NFS for some tools === 2025-06-18 === * 14:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 14:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 04:22 andrewbogott: rebooting tools-prometheus-8; unreachable === 2025-06-16 === * 17:41 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 17:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 12:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 12:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 === 2025-06-14 === * 16:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 16:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 === 2025-06-12 === * 10:36 dcaro: rebooting tools-prometheus-8 due to the VM having load issues (not responding to ssh) * 10:34 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 10:28 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-06-11 === * 13:39 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 13:33 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.5.0, Alloy 1.9.1 * 11:18 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.9.1 * 11:18 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.0 * 11:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.0, Alloy 1.9.1 * 11:09 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=99) for Loki 3.5.0, Alloy 1.9.1 * 11:09 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.0 * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.0, Alloy 1.9.1 === 2025-06-10 === * 17:04 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 17:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 16:41 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 16:28 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 16:26 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 16:21 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 15:45 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:33 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:21 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 15:15 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:57 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 11:48 taavi: add AAAA records to tools/toolsbeta-harbor proxies, previous monitoring issues resolved === 2025-06-06 === * 21:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-74 * 21:40 andrewbogott: restarting tools-prometheus-9 and tools-prometheus-8, lots of tools metrics just went dark * 21:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-74 * 18:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 18:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 15:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2025-06-05 === * 22:24 andrewbogott: running /srv/tools/cleanup.sh on tools-nfs-2 in a screen session, trying to clear disk space alert * 15:06 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:53 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-05-30 === * 16:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 15:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-11 * 15:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-11 * 15:28 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component components-api * 15:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 07:38 taavi: reboot tools-static-15 to unstuck NFS things === 2025-05-24 === * 12:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-65 * 12:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-65 === 2025-05-23 === * 16:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 * 16:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 * 03:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 * 02:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 === 2025-05-22 === * 21:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 21:17 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-45, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-55 * 20:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-45, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-55 * 20:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 19:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 19:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-53, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-21 * 19:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 19:26 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 19:15 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-53, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-21 * 19:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 18:15 dcaro: restart tools-static nginx due to nfs hiccup * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-8 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-8 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-7 * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-7 * 07:58 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=1) for instance toolsbeta-prometheus-1 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance toolsbeta-prometheus-1 * 07:33 taavi: add AAAA record on *.toolforge.org [[phab:T211575|T211575]] === 2025-05-21 === * 15:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-10.tools.eqiad1.wikimedia.cloud * 15:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-9.tools.eqiad1.wikimedia.cloud * 15:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-10.tools.eqiad1.wikimedia.cloud * 15:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-9.tools.eqiad1.wikimedia.cloud * 13:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 13:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 09:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-9.tools.eqiad1.wikimedia.cloud * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-9.tools.eqiad1.wikimedia.cloud * 09:27 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/busybox:1.35 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/bitnami-kubectl:1.30.2 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-reports-controller:v1.13.6 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-background-controller:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyvernopre:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 * 09:25 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:04 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 09:03 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 08:54 dcaro: deployed the new dns entry for docker-registry.svc.toolforge.org (might take some time to refresh) * 08:47 dcaro: deleting docker-registry.svc.toolforge.org proxy to use dns entry to floating ip instead === 2025-05-20 === * 19:40 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 19:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 17:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 17:18 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 17:18 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 17:16 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 17:16 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 16:11 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 16:11 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 * 16:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 15:48 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 15:48 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 15:47 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 15:46 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports:v1.13.6 * 15:46 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup:v1.13.6 * 15:45 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background:v1.13.6 * 15:45 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 15:44 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 15:44 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 15:44 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 15:01 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 15:00 wmbot~dcaro@acme: Updating container image toolforge-kyverno-kyverno:v1.13.6 * 15:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 14:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 14:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:58 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=97) * 14:58 wmbot~dcaro@acme: Updating container image toolforge-kyverno-kyverno:v1.13.6 * 14:58 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 13:57 taavi: disable host-based authentication in sshd config, not used since grid shutdown * 13:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 * 13:07 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 * 13:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 * 13:05 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-8.tools.eqiad1.wikimedia.cloud * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-8.tools.eqiad1.wikimedia.cloud * 09:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase === 2025-05-19 === * 08:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 08:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2025-05-16 === * 18:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-9 * 17:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-9 * 16:51 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:44 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 16:44 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 16:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 16:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:08 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 12:07 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2025-05-14 === * 17:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 17:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 08:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2025-05-13 === * 15:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 15:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 07:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-36 * 07:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 === 2025-05-12 === * 19:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 16:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 13:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:23 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:04 arturo: add container image to docker registry docker-registry.tools.wmflabs.org/tofu-provisioning:20250512 ([[phab:T393686|T393686]]) * 11:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 11:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 11:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 09:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 09:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 08:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 08:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 02:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19 * 02:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19 === 2025-05-10 === * 17:35 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-sudo<nowiki>{</nowiki>,.socket<nowiki>}</nowiki> # looks like the reset-failed didnโ€™t work properly, systemd didnโ€™t even try to start the service again afaict ([[phab:T393732|T393732]]) * 17:34 lucaswerkmeister: root@tools-bastion-13:~# systemctl reset-failed sssd-<nowiki>{</nowiki>pam,sudo<nowiki>}</nowiki>.service && systemctl restart sssd-pam<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>.socket # try to reset the rate limits this way ([[phab:T393732|T393732]]) * 16:22 lucaswerkmeister: systemctl restart sssd-<nowiki>{</nowiki>pam<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>,sudo<nowiki>}</nowiki>.socket # service-start-limit-hit, [[phab:T393732|T393732]]? * 14:10 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # service-start-limit-hit, [[phab:T393732|T393732]]? * 11:53 lucaswerkmeister: [[phab:T393732|T393732]] note: restart of sssd-pam.service actually failed, โ€œmay be requested by dependency onlyโ€; overall it still seems to have worked though (so next time restarting the sockets is probably sufficient) * 11:52 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-pam<nowiki>{</nowiki>,<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>.socket<nowiki>}</nowiki> # all three failed with start-limit-hit / Start request repeated too quickly; [[phab:T393732|T393732]]? === 2025-05-09 === * 12:31 arturo: hard-reboot tools-bastion-13 (login.toolforge.org) because unresponsive (out of memory) -- previous reboot was for tools-bastion-12 (dev.t.o) by mistake * 12:29 arturo: hard-reboot tools-bastion-12 (login.toolforge.org) because unresponsive (out of memory) * 07:10 taavi: kill bunch of unwanted processes off of tools-bastion-13 [[phab:T393732|T393732]], please run your things as jobs === 2025-05-08 === * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 17:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 17:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 17:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 17:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 16:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 16:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 16:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:46 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-admission * 16:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:24 taavi: root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # was in failed state * 08:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-05-07 === * 18:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector-2 * 17:58 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector-2 * 16:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 12:58 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 12:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 12:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 11:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 10:36 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 10:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:40 taavi: remove 'roots' ldap sudo policy [[phab:T392797|T392797]] * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:33 dcaro: released jobs-cli 16.1.12 * 09:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 09:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-05-06 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 16:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 15:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:24 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 15:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 13:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:55 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 12:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69 * 12:10 dcaro: rebooting tools-k8s-worker-nfs-69 due to some stuck processes * 12:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69 === 2025-05-04 === * 11:12 dcaro: deleting tools-services-05, has been off for a year (replaced with 06) === 2025-05-02 === * 18:37 taavi: add elasticsearch credential for tools.techcontribs [[phab:T393209|T393209]] * 13:55 taavi: reboot tools-static-15 === 2025-04-28 === * 13:07 dhinus: tools-db-4: systemctl stop mariadb && systemctl start mariadb [[phab:T392596|T392596]] * 13:06 dhinus: tools-db-5: systemctl stop mariadb && systemctl start mariadb [[phab:T392596|T392596]] * 13:05 dhinus: tools-db-5: systemctl stop mariadb && systemctl start mariadb [[phab:T318479|T318479]] === 2025-04-24 === * 23:09 bd808: `systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl restart sssd` on tools-bastion-12 * 23:03 bd808: `sss_cache -E` on tools-bastion-12 after seeing "sudo: PAM account management error: Authentication service cannot retrieve authentication info" * 18:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 18:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 18:38 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli * 18:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 18:32 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli * 18:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 11:51 taavi: add missing ICMPv6 security group rule to 'default' group * 08:02 taavi: add an AAAA record for toolserver.org [[phab:T392506|T392506]] === 2025-04-23 === * 19:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 * 19:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 * 15:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-3.tools.eqiad1.wikimedia.cloud * 15:55 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-3.tools.eqiad1.wikimedia.cloud * 15:10 arturo: give `tools-tofu` bot account member powers for https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning * 13:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:02 taavi: rebooting tools-mail-4 with stuck NFS handles === 2025-04-21 === * 09:52 taavi: update pywikibot-scripts-stable image to v10.0.0 [[phab:T385400|T385400]] === 2025-04-17 === * 16:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 16:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder === 2025-04-16 === * 19:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 19:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 19:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 19:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 14:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-04-15 === * 13:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-11 === * 21:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 21:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-10 === * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 15:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 === 2025-04-09 === * 21:35 bd808: Removed rook and sstefanova from https://gitlab.wikimedia.org/groups/toolforge-repos/ owners (both offboarded former WMCS staff) * 10:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-08 === * 15:17 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 15:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 02:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 02:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-04-07 === * 19:26 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 19:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:48 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:33 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-109 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:32 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-109 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:11 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:10 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:08 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:08 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-79 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-79 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-78 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:06 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-78 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-77 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-77 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-76 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-76 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-75 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-75 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-74 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-74 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-73 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-73 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-72 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-72 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-71 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-71 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-70 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:54 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-70 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-69 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:51 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-69 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-68 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-111 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-68 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-67 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-111 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-110 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:48 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:48 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-67 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-110 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-66 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-66 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-65 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:45 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-65 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-104 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:40 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:38 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:37 fnegri@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:30 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:22 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:22 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:15 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 08:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 08:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 07:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 07:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 05:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 05:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-04-06 === * 02:12 andrewbogott: truncating large logfiles on tools nfs === 2025-04-04 === * 10:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 09:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 09:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:21 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 09:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 08:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 08:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 08:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 07:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 07:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 07:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 07:03 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 07:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 02:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes === 2025-04-03 === * 22:26 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes * 22:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 22:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 22:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 22:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 22:22 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-33 * 22:17 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 22:16 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33 * 22:13 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-71 * 22:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 22:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-74 * 22:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-71 * 21:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-74 * 21:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 21:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 * 20:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 20:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 08:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 * 08:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 === 2025-04-02 === * 20:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-55 * 20:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-55 * 12:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 * 12:37 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 === 2025-04-01 === * 14:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 13:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 * 13:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 13:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 13:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 * 13:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 === 2025-03-31 === * 12:48 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 12:42 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 12:03 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 11:58 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 * 09:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 08:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 === 2025-03-28 === * 16:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 16:40 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 13:58 taavi: reboot tools-static-15 due to stuck nginx worker processes * 10:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T389733|T389733]]) * 10:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T389733|T389733]]) * 09:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T389733|T389733]]) * 09:30 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T389733|T389733]]) === 2025-03-27 === * 17:34 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-33 * 17:26 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-33 * 17:26 root@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all NFS workers * 15:59 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:53 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all NFS workers * 15:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:02 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-111.tools.eqiad1.wikimedia.cloud to the cluster * 14:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:33 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 === 2025-03-25 === * 15:32 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:18 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2 * 13:58 andrewbogott: rebooting tools-k8s-worker-nfs-2 * 13:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2 * 10:32 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 10:32 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 08:39 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 08:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-03-24 === * 18:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 18:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 18:24 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 18:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 18:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 18:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 17:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:35 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 09:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 === 2025-03-22 === * 04:00 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 03:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 03:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 03:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 === 2025-03-20 === * 14:04 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'chuckonwumelu' in role 'member' * 14:04 aborrero@cloudcumin1001: START - Cookbook wmcs.vps.add_user_to_project for user 'chuckonwumelu' in role 'member' === 2025-03-18 === * 15:23 arturo: hard-reboot tools-prometheus-6, not responding to ssh * 10:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 10:30 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 10:03 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T383238|T383238]]) * 09:57 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T383238|T383238]]) === 2025-03-17 === * 19:01 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 19:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 18:42 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:41 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:37 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:36 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:32 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:32 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 14:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T388965|T388965]]) * 14:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T388965|T388965]]) === 2025-03-16 === * 11:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 === 2025-03-15 === * 15:31 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-16,tools-k8s-worker-nfs-34,tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16,tools-k8s-worker-nfs-34,tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 12:55 dcaro: there was an NFS hiccup that made the NFS checks fail for a second and some workers get stuck for a bit [[phab:T388965|T388965]] === 2025-03-13 === * 22:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 18:14 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T362868|T362868]]) * 18:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T362868|T362868]]) * 18:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api ([[phab:T362868|T362868]]) * 17:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api ([[phab:T362868|T362868]]) * 17:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission ([[phab:T362868|T362868]]) * 17:29 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission ([[phab:T362868|T362868]]) * 17:27 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission ([[phab:T362868|T362868]]) * 17:17 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission ([[phab:T362868|T362868]]) * 17:14 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api ([[phab:T362868|T362868]]) * 17:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362868|T362868]]) * 16:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission ([[phab:T362868|T362868]]) * 16:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission ([[phab:T362868|T362868]]) * 16:25 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission ([[phab:T362868|T362868]]) * 16:14 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission ([[phab:T362868|T362868]]) * 10:17 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 10:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 === 2025-03-12 === * 17:56 dhinus: aptly repo remove bookworm-tools helmfile, removing custom version that is older than the one from apt.w.o * 03:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-03-11 === * 17:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:31 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 14:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:58 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 10:46 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-03-10 === * 20:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 20:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 20:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 20:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 20:09 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 19:59 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:50 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 17:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder === 2025-03-07 === * 13:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 13:18 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2025-03-06 === * 13:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 12:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 12:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:15 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 12:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-03-05 === * 19:16 dhinus: systemctl restart prometheus@tools on tools-prometheus-7 (the two prom hosts are returning different values) * 17:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362868|T362868]]) * 17:44 fnegri@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.2 ([[phab:T362868|T362868]]) * 17:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362868|T362868]]) * 16:06 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 16:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:13 dcaro: restarting ingress pods due to ingress timing out sometimes * 08:09 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 08:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2025-03-04 === * 20:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:47 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:28 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 15:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362868|T362868]]) * 14:01 fnegri@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.12.0 ([[phab:T362868|T362868]]) * 14:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362868|T362868]]) * 13:51 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:40 dhinus: reboot tools-legacy-redirector-2 (http probes failing more than usual) * 12:50 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 12:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 10:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 09:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 09:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:58 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-03-03 === * 17:04 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:55 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:18 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 16:09 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 13:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 13:10 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 13:01 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 11:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 11:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-03-01 === * 19:08 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 19:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 16:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 * 16:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 * 15:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 === 2025-02-27 === * 16:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder === 2025-02-26 === * 14:22 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:05 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-02-25 === * 19:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 === 2025-02-24 === * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 21:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 21:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 20:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-02-21 === * 12:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 === 2025-02-20 === * 13:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer ([[phab:T320284|T320284]]) * 13:18 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer ([[phab:T320284|T320284]]) === 2025-02-19 === * 20:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 20:25 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 20:25 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 20:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 === 2025-02-18 === * 17:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 * 17:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 * 16:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 16:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 15:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 ([[phab:T380679|T380679]]) * 15:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 ([[phab:T380679|T380679]]) * 15:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 ([[phab:T380679|T380679]]) * 15:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 ([[phab:T380679|T380679]]) === 2025-02-17 === * 17:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 17:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2025-02-10 === * 12:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-02-09 === * 16:38 andrewbogott: rebooting tools-db-4 just in case that helps with the recurring DB crashes === 2025-02-07 === * 20:51 arturo: resize tools-legacy-redirector to have 2 vCPU [[phab:T385908|T385908]] * 17:58 andrewbogott: "SET GLOBAL read_only=OFF; " on tools-db-4; both -5 and -4 were set to read only. No idea why or how... * 01:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 01:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 * 01:28 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-07 * 01:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-07 * 01:27 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-07 * 01:27 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-07 === 2025-02-06 === * 17:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 17:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 15:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 14:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 14:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:06 andrewbogott: cold-migrating tools-proxy-8 for [[phab:T385264|T385264]]; will cause a brief toolforge outage * 14:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 14:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 13:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 13:06 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 13:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 12:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:37 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 12:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 12:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2025-02-03 === * 14:40 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-haproxy-5, tools-k8s-haproxy-6 * 14:40 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-haproxy-5, tools-k8s-haproxy-6 * 13:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-9, tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 * 13:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-9, tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 * 13:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 * 13:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 * 13:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-7 * 13:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 === 2025-02-01 === * 15:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-108 * 15:05 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-108 * 15:05 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-107 * 15:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-107 * 15:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-106 * 15:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-106 * 15:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-105 * 15:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-105 * 15:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 15:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 15:01 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-102 * 15:01 andrewbogott: rebooting all k8s (non-nfs) worker nodes for [[phab:T385264|T385264]] * 15:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-102 * 14:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 14:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 14:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 14:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 * 14:55 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-71 * 14:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-71 * 14:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-66 * 14:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-66 * 14:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 * 14:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 * 14:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 14:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 14:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 * 14:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 * 14:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 14:44 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 14:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 14:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 14:42 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 14:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 14:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-40 * 14:40 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-40 * 14:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 14:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 * 14:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-3 * 14:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-3 * 14:37 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32 * 14:36 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32 * 14:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 14:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 14:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 * 14:34 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 * 14:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 14:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 * 14:30 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 * 14:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 * 14:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 * 14:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-11 * 14:29 andrewbogott: rebooting all k8s-nfs worker nodes for [[phab:T385264|T385264]] * 14:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-11 * 14:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 * 14:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 * 14:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 === 2025-01-31 === * 11:04 dhinus: systemctl restart prometheus@tools on tools-prometheus-7 [[phab:T385262|T385262]] === 2025-01-29 === * 01:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-01-27 === * 16:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 15:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:52 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 13:52 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-01-26 === * 22:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 22:04 andrewbogott: restarting Node tools-k8s-worker-nfs-44 , too many D processes * 22:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 22:02 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-m8s-worker-nfs-44 * 22:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-m8s-worker-nfs-44 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud * 08:37 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:37 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-79.tools.eqiad1.wikimedia.cloud to the cluster * 08:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:26 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-78.tools.eqiad1.wikimedia.cloud to the cluster * 08:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:16 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-77.tools.eqiad1.wikimedia.cloud to the cluster * 08:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 08:06 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-110.tools.eqiad1.wikimedia.cloud to the cluster * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster ([[phab:T384790|T384790]]) * 07:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 07:56 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud to the cluster * 07:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster ([[phab:T384790|T384790]]) * 07:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 * 07:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 === 2025-01-24 === * 10:39 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 * 10:34 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 === 2025-01-23 === * 14:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:10 dcaro: reboot tools-static-15 due to nginx stuck on nfs === 2025-01-22 === * 17:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 17:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 === 2025-01-18 === * 15:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 === 2025-01-17 === * 15:52 dhinus: reboot tools-legacy-redirector-2 (http probes were failing) === 2025-01-15 === * 04:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 04:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 03:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-01-13 === * 21:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 ([[phab:T383625|T383625]]) * 21:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 ([[phab:T383625|T383625]]) * 21:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 ([[phab:T383625|T383625]]) * 21:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19 ([[phab:T383238|T383238]]) * 21:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 ([[phab:T383625|T383625]]) * 21:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 ([[phab:T383625|T383625]]) * 21:24 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19 ([[phab:T383238|T383238]]) * 21:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 ([[phab:T383625|T383625]]) * 21:19 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 21:18 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 21:18 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-21 ([[phab:T383238|T383238]]) * 21:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T383625|T383625]]) * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T383625|T383625]]) * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383238|T383238]]) * 21:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2 ([[phab:T383238|T383238]]) * 21:14 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-75 ([[phab:T383238|T383238]]) * 21:13 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T383238|T383238]]) * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 ([[phab:T383625|T383625]]) * 21:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2 ([[phab:T383238|T383238]]) * 21:08 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 21:05 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 ([[phab:T383625|T383625]]) * 21:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 21:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 ([[phab:T383238|T383238]]) * 20:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 ([[phab:T383238|T383238]]) * 20:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16 ([[phab:T383238|T383238]]) * 20:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 ([[phab:T383625|T383625]]) * 20:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16 ([[phab:T383238|T383238]]) * 20:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 20:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 ([[phab:T383625|T383625]]) * 20:49 dcaro: restart prometheus to pick up the new ips for vms and such * 20:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 20:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 20:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-8 * 20:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 20:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-20 ([[phab:T383625|T383625]]) * 20:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 ([[phab:T383625|T383625]]) * 20:42 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-20 ([[phab:T383238|T383238]]) * 20:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 ([[phab:T383238|T383238]]) * 20:42 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 20:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-8 * 20:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 20:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 20:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 20:36 lucaswerkmeister: restore root-owned /tmp/framer.txt on tools-sgebastion-10, tools-bastion-12, tools-bastion-13 (cf. 2025-01-05 log entry) following bastion reboots === 2025-01-12 === * 09:53 taavi: hard reboot tools-k8s-worker-nfs-55 === 2025-01-08 === * 18:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 ([[phab:T383238|T383238]]) * 18:34 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 ([[phab:T383238|T383238]]) * 18:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32 ([[phab:T383238|T383238]]) * 18:26 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32 ([[phab:T383238|T383238]]) * 18:19 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 ([[phab:T383238|T383238]]) * 18:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 ([[phab:T383238|T383238]]) * 18:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 18:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 18:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 ([[phab:T383238|T383238]]) * 18:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 ([[phab:T383238|T383238]]) * 18:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 ([[phab:T383238|T383238]]) * 18:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 ([[phab:T383238|T383238]]) * 18:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-8 ([[phab:T383238|T383238]]) * 17:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-8 ([[phab:T383238|T383238]]) * 17:59 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-27 ([[phab:T383238|T383238]]) * 17:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-27 ([[phab:T383238|T383238]]) * 17:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67 ([[phab:T383238|T383238]]) * 17:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 ([[phab:T383238|T383238]]) * 17:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 ([[phab:T383238|T383238]]) * 17:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 ([[phab:T383238|T383238]]) * 17:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-26 ([[phab:T383238|T383238]]) * 17:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-26 ([[phab:T383238|T383238]]) * 17:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 ([[phab:T383238|T383238]]) * 17:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 ([[phab:T383238|T383238]]) * 17:27 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 ([[phab:T383238|T383238]]) * 17:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 ([[phab:T383238|T383238]]) * 17:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 ([[phab:T383238|T383238]]) * 17:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 ([[phab:T383238|T383238]]) * 17:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 ([[phab:T383238|T383238]]) * 17:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 ([[phab:T383238|T383238]]) * 16:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 16:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 16:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-65 ([[phab:T383238|T383238]]) * 16:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-65 ([[phab:T383238|T383238]]) * 16:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 ([[phab:T383238|T383238]]) * 16:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 ([[phab:T383238|T383238]]) * 16:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 16:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 16:00 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 15:55 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 15:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 15:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 * 15:40 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 15:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 15:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-42 * 15:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-42 * 15:29 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22 * 15:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22 * 15:09 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 15:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 14:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 * 14:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 * 14:25 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-70 * 14:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-70 * 14:16 dcaro: reboot tools-static-15 nfs is stuck === 2025-01-07 === * 00:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 00:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 00:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:09 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor === 2025-01-06 === * 23:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 23:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 23:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 23:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 23:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 23:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 23:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 23:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 23:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 23:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 23:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 16:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-01-05 === * 18:58 lucaswerkmeister: remove /tmp/framer.txt on tools-bastion-13 (I notified the owner privately), and replace it with a root-owned file to prevent iTerm from leaking logs into it (https://iterm2.com/downloads/stable/iTerm2-3_5_11.changelog) on tools-sgebastion-10, tools-bastion-12 and tools-bastion-13 === 2025-01-03 === * 21:46 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69 * 21:41 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69 * 21:40 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-69 * 21:35 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-69 === 2025-01-02 === * 02:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 * 02:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 === 2025-01-01 === * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 21:05 andrewbogott: truncating *.err and *.out files to clear out NFS space * 21:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 * 21:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34 * 20:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34 === 2024-12-13 === * 14:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 14:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 14:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 14:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 09:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 09:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 * 09:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 09:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 08:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73 * 08:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73 === 2024-12-12 === * 10:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 10:47 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2024-12-06 === * 17:26 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-1 ([[phab:T352206|T352206]]) * 17:25 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-1 ([[phab:T352206|T352206]]) * 17:24 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-3 ([[phab:T352206|T352206]]) * 17:23 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-3 ([[phab:T352206|T352206]]) * 07:56 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 07:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-12-05 === * 16:34 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:42 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:06 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 13:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-12-04 === * 19:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 19:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 19:26 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 19:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 17:46 andrewbogott: rebooting tools-legacy-redirector-2, many probes failing * 17:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 17:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 17:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:54 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:47 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:45 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 15:45 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:26 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 15:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 15:11 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-api * 15:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 15:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 14:46 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:45 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 01:31 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:15 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:12 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-12-03 === * 22:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 22:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 21:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 21:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component main * 21:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component main === 2024-11-29 === * 03:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-27 === * 18:26 taavi: kubectl sudo rollout restart -n kube-system deployment coredns # update resolv.conf in coredns containers === 2024-11-26 === * 10:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-7 * 10:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:36 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:34 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:32 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-9 * 10:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-9 * 10:22 dcaro: rebooting k8s-control-9 * 10:18 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 * 10:17 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 * 10:17 dcaro: rebooting k8s-control-8 * 09:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 09:14 dcaro: restarting tools-k8s-worker-nfs-72 * 09:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 09:12 dcaro: restarting tools-k8s-worker-nfs-70 * 09:11 dcaro: restarting tools-k8s-worker-nfs-50 * 09:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 09:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 08:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 * 08:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 * 07:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers ([[phab:T380827|T380827]]) * 06:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers ([[phab:T380827|T380827]]) === 2024-11-25 === * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 12:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2024-11-23 === * 07:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder ([[phab:T358225|T358225]]) * 07:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder ([[phab:T358225|T358225]]) === 2024-11-20 === * 15:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 00:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission ([[phab:T362867|T362867]]) * 00:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission ([[phab:T362867|T362867]]) === 2024-11-19 === * 21:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 21:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 21:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 21:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 20:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 20:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 20:38 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-api ([[phab:T362867|T362867]]) * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api ([[phab:T362867|T362867]]) * 20:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T362867|T362867]]) * 20:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T362867|T362867]]) * 20:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 20:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 19:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission ([[phab:T362867|T362867]]) * 19:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission ([[phab:T362867|T362867]]) * 19:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission ([[phab:T362867|T362867]]) * 19:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission ([[phab:T362867|T362867]]) * 15:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-18 === * 14:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 14:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 14:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 14:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 11:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 11:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-11-15 === * 14:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:04 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T352206|T352206]]) * 13:50 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:49 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) === 2024-11-14 === * 13:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 13:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:04 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-12 === * 15:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 10:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-11 === * 16:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 15:58 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:44 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:42 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:37 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-11-10 === * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.11.0 ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362867|T362867]]) === 2024-11-06 === * 16:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 07:57 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 07:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 07:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-11-05 === * 17:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 08:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 08:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 07:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 07:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico === 2024-11-04 === * 16:39 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:30 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:22 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-69 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:52 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:04 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 12:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 12:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:11 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:59 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 11:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 11:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:56 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:42 dcaro: added api.svc.toolforge.org dns record entry * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 10:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:56 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:55 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:51 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-22 === * 13:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 * 12:58 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 09:05 arturo: restart puppetserver service for [[phab:T377803|T377803]] === 2024-10-16 === * 09:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-15 === * 17:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:16 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 16:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-14 === * 09:14 dcaro: migrating pipelineruns stored versions to v1 ([[phab:T376710|T376710]]) * 07:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 * 07:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-10-09 === * 09:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-08 === * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld ([[phab:T376710|T376710]]) * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld ([[phab:T376710|T376710]]) * 12:38 dcaro: tests are passing correctly, upgrade finished, will investigate the increased slowness as a followup * 12:27 dcaro: upgrade finished, build actions have become slower than usual ([[phab:T376710|T376710]]), running tests and investigating * 12:02 dcaro: starting toolforge builds-builder upgrade, no downtime expected though some builds might fail to start/list/log/show while the upgrade is in progress [[phab:T374908|T374908]] * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-04 === * 11:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 11:51 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 11:44 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-02 === * 09:11 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 09:07 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-01 === * 10:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 10:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 10:28 dcaro: updated ci image with latest precommit versions * 10:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:52 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 09:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-30 === * 18:25 taavi: run striker migrations [[phab:T359428|T359428]] === 2024-09-28 === * 00:14 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 00:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2024-09-27 === * 23:58 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 23:52 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-26 === * 16:45 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 16:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 16:08 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:05 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 15:58 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 10:20 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 10:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 10:05 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 07:53 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 07:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-25 === * 08:00 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 07:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 === 2024-09-24 === * 22:11 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T375157|T375157]]) * 22:03 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T375157|T375157]]) * 21:48 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno ([[phab:T359641|T359641]]) * 21:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component kyverno ([[phab:T359641|T359641]]) === 2024-09-20 === * 20:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 20:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 19:36 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 19:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 17:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:06 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/pod2daemon-flexvol:v3.28.2 ([[phab:T359641|T359641]]) * 17:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/typha:v3.28.2 ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/node:v3.28.2 ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/kube-controllers:v3.28.2 ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/ctl:v3.28.2 ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 06:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 00:39 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) * 00:32 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) === 2024-09-19 === * 23:17 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.10 ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 23:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.10.1 ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:38 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli ([[phab:T341066|T341066]]) * 17:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli ([[phab:T341066|T341066]]) * 17:13 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api ([[phab:T341066|T341066]]) * 17:06 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:48 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 16:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:45 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 16:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:38 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:10 dcaro: rebooting tools-k8s-worker-nfs-24 it's stuck without network * 16:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:08 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:07 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:28 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:19 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:08 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:01 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:57 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 14:56 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) === 2024-09-17 === * 08:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 03:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:13 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-64 * 03:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-63 * 03:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 03:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:07 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-76.tools.eqiad1.wikimedia.cloud to the cluster * 03:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 03:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:00 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud to the cluster * 02:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:46 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-74.tools.eqiad1.wikimedia.cloud to the cluster * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-62 * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-60 * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 02:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:38 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-73.tools.eqiad1.wikimedia.cloud to the cluster * 02:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-72.tools.eqiad1.wikimedia.cloud to the cluster * 02:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:24 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-71.tools.eqiad1.wikimedia.cloud to the cluster * 02:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:12 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-6 * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-56 * 02:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:08 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud to the cluster * 02:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 02:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-49 * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-31 * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:57 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-69.tools.eqiad1.wikimedia.cloud to the cluster * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-30 * 01:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-29 * 01:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 01:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-28 * 01:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:42 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-68.tools.eqiad1.wikimedia.cloud to the cluster * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 01:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-67.tools.eqiad1.wikimedia.cloud to the cluster * 01:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:23 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-66.tools.eqiad1.wikimedia.cloud to the cluster * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-49, tools-k8s-worker-nfs-50 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-36 ([[phab:T359641|T359641]]) === 2024-09-16 === * 17:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 17:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 17:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 17:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-09-13 === * 11:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 09:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) === 2024-09-12 === * 12:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:54 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) * 11:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) === 2024-09-11 === * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-09-09 === * 16:23 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component cert-manager * 16:16 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component cert-manager === 2024-09-06 === * 08:47 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:42 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 08:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 07:14 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/pause:3.6 * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-09-05 === * 13:50 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/stakater-reloader:v1.1.0 ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:46 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:28 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/cainjector:v1.15.3 ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/webhook:v1.15.3 ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/controller:v1.15.3 ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) === 2024-09-04 === * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:02 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 13:56 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 13:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 13:02 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-03 === * 20:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 19:53 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 19:48 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 19:36 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 19:29 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 15:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno * 15:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 15:29 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component kyverno * 15:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 14:41 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:55 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 11:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 * 05:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 === 2024-09-02 === * 14:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:30 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:25 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:02 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 13:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 12:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 11:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.25.16 to 1.26.15 * 11:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.25.16 to 1.26.15 * 10:05 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:58 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 09:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 09:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:48 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component components-api * 08:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-29 === * 16:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 07:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2024-08-27 === * 12:06 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:06 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.11.2 * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 09:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 wmbot~dcaro@urcuchillay: Added a new k8s worker tools-k8s-worker-108.tools.eqiad1.wikimedia.cloud to the cluster * 09:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 08:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 08:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 08:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 08:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:37 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:19 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-08-26 === * 21:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 21:13 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-64.tools.eqiad1.wikimedia.cloud to the cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-63.tools.eqiad1.wikimedia.cloud to the cluster * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 18:35 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-62.tools.eqiad1.wikimedia.cloud to the cluster * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-61.tools.eqiad1.wikimedia.cloud to the cluster * 16:54 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-60.tools.eqiad1.wikimedia.cloud to the cluster * 16:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-58.tools.eqiad1.wikimedia.cloud to the cluster * 16:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-57.tools.eqiad1.wikimedia.cloud to the cluster * 15:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:49 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:44 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:39 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:38 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 15:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:15 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 13:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 13:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:44 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 11:06 dcaro: manually deleted the coredns pods that had been around for 4d * 09:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 09:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 08:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 08:18 dcaro: scale up cordens deployment to 4 replicas === 2024-08-21 === * 05:44 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 05:38 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 05:27 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 05:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 04:55 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 04:43 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 04:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:28 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:25 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:22 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:21 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:20 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:10 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 04:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:49 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:42 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:28 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:19 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 03:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-08-19 === * 22:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 21:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 21:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 === 2024-08-15 === * 06:30 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-20 * 06:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 === 2024-08-13 === * 09:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 07:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-08-12 === * 15:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:51 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 11:46 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-08-08 === * 16:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 16:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 16:36 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 16:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 16:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-06 === * 09:50 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 09:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:50 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:19 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2024-08-05 === * 13:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 13:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 11:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-08-01 === * 20:42 bd808: Uncordoned tools-k8s-worker-nfs-55 following reboot * 20:40 bd808: Hard reboot of tools-k8s-worker-nfs-55 following drain cookbook run. Stuck pod remained stuck as expected. * 20:37 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-55 * 20:32 bd808: Draining and rebooting tools-k8s-worker-nfs-55 after reports of stuck pods via irc * 20:32 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 15:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 15:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api === 2024-07-31 === * 20:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 20:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 16:17 andrewbogott: changing login.tools.wmlabs.org to point to a newer bastion, tools-bastion-12, in response to [[phab:T371505|T371505]] * 11:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 11:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 11:33 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 10:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 * 09:49 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 === 2024-07-30 === * 18:08 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:06 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 18:01 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:40 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:39 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:37 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 17:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 16:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 === 2024-07-29 === * 18:24 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:23 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 18:06 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:05 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 16:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 14:05 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 14:03 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 13:19 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 13:18 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 12:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-cli * 12:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli * 12:01 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-cli * 12:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli === 2024-07-25 === * 15:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 15:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics === 2024-07-24 === * 09:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 09:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 08:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 08:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 07:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component ingress-admission * 06:57 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission === 2024-07-23 === * 15:04 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 15:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 13:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 12:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 12:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-22 === * 17:42 dcaro: moved the apt repo to service endpoint deb.svc.toolforge.org * 17:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-3 * 17:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-3 * 17:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 17:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 17:00 dcaro: moving the toolforge apt repo to tools-services-06 * 16:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-services-06.tools.eqiad1.wikimedia.cloud * 16:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-services-06.tools.eqiad1.wikimedia.cloud * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-19 === * 12:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:46 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.9.2 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 10:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 10:02 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.9.6 * 10:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-07-18 === * 14:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 14:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 08:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-17 === * 14:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 08:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-07-16 === * 15:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 11:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 10:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:57 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:53 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:27 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:20 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:18 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:17 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 10:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 10:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 09:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:07 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.24.17 to 1.25.16 * 09:06 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.24.17 to 1.25.16 * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-15 === * 14:42 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:40 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-11 === * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:49 dcaro: deploy toolforge-jobs-framework 16.0.13 ([[phab:T369573|T369573]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 11:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-10 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 16:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 16:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 15:16 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-09 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 14:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-07-08 === * 20:22 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 20:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-1 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-1 * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 13:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 12:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 12:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-07-05 === * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:27 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:27 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:26 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:23 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.7.0 * 12:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 11:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 01:47 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 01:46 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-04 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 12:57 arturo: updating kubelet flags [[phab:T355881|T355881]] * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 07:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-07-03 === * 12:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 10:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-02 === * 17:16 andrewbogott: draining (I hope) tools-elastic-3 and tools-elastic-1 for [[phab:T311905|T311905]] * 17:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 16:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 16:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:53 arturo: cleanup kubeadm configmap from TTLAfterFinished settings ([[phab:T349197|T349197]]) * 11:51 arturo: remove --feature-gates=TTLAfterFinished=true from kube-controller-manager static pod definition ([[phab:T349197|T349197]]) * 10:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 09:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 09:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-07-01 === * 15:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 14:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 14:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission === 2024-06-28 === * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-06-27 === * 16:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-23 * 16:44 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-23 * 16:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-1 * 16:21 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-db-1 * 15:49 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-3 * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-3 * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-24 * 15:37 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-24 * 15:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-22 * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-22 * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 14:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 14:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 11:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:02 arturo: drop all PSP definitions for all accounts ([[phab:T368142|T368142]]) * 10:02 arturo: disabled PodSecurityPolicy admission plugin from kubeadm configmap ([[phab:T368142|T368142]]) * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-26 === * 11:40 taavi: update pywikibot image to 9.2 [[phab:T363631|T363631]] * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:18 arturo: deploying toolforge-webservice 0.103.9 ([[phab:T368463|T368463]]) * 09:18 arturo: setting kyverno policies to Enforce ([[phab:T368141|T368141]]) * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 === 2024-06-25 === * 21:50 bd808: Live hacked /usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py on login-buster.toolforge.org to remove the `-> dict[str, Any]` type annotations causing [[phab:T368463|T368463]] * 12:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-104 * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-104 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-103 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-103 * 12:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-102 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-56 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-56 * 12:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-55 * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-55 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-54 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-54 * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-53 * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-53 * 12:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-52 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-52 * 12:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-51 * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-53 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-51 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-53 * 11:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-52 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-52 * 11:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-50 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-50 * 11:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-7 * 11:10 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-7 * 11:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.migrate_floating_ip (exit_code=0) for address 185.15.56.11 to server 'tools-proxy-8' * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.migrate_floating_ip for address 185.15.56.11 to server 'tools-proxy-8' * 09:44 arturo: deploy toolforge-webservice 0.103.8 ([[phab:T362050|T362050]]) * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-9 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-9 * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-9 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-9 * 08:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-49 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-49 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-47 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-47 * 08:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-45 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-47 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-47 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-45 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-44 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-46 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-46 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-44 * 08:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-43 * 08:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-42 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-44 * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-44 * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-42 * 08:13 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:07 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-41 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-41 * 08:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-39 * 07:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-39 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-38 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-38 * 07:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-37 * 07:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-37 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-36 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-36 * 07:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-35 * 07:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-35 * 07:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-34 * 07:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-34 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-35 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-33 * 07:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-35 * 07:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-33 * 07:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-33 * 07:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-33 === 2024-06-24 === * 20:56 andrewbogott: rebooting tools-k8s-worker-nfs-36; it has lots of stuck processes which somehow didn't get unstuck when we did the post-nfs-migration reboots. * 15:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-32 * 15:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-32 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-31 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-32 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-31 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-32 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-30 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-30 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-29 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-29 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-28 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-29 * 15:45 arturo: deploy toolforge-webservice 0.103.7 ([[phab:T362050|T362050]]) * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-29 * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-28 * 15:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-28 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-28 * 15:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-27 * 15:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-27 * 15:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-sgebastion-10 * 14:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-sgebastion-10 * 14:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-13 * 14:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-13 * 14:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 14:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-nfs-2 * 14:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs (exit_code=0) for server tbd * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs for server tbd * 13:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-26 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-24 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-26 * 13:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-24 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-24 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-24 * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-22 * 13:29 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-22 * 13:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-21 * 13:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-21 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-20 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-20 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-21 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-19 * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-21 * 13:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-19 * 13:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-20 * 13:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-17 * 13:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-20 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-16 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-16 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-15 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-15 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-14 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-14 * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-13 * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-13 * 12:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-12 * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-12 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-12 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-12 * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-7 * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-7 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-8 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-8 * 12:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-8 * 12:13 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-8 * 12:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-static-15 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-static-15 * 12:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-acme-chief-4 * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-acme-chief-4 * 12:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=97) for node tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-10 * 11:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-10 * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-9 * 11:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-9 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-8 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-8 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-7 * 11:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-7 * 11:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-4 * 11:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-4 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-4 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-3 * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-3 * 11:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-2 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-2 * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 10:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-5 * 10:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-5 * 10:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-7 * 10:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-7 * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-7 * 10:11 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-43 * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-7 * 10:09 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 10:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-7 * 10:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-7 * 10:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-7 * 10:03 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-43 * 10:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-7 * 10:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-6 * 09:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-6 * 09:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-cumin-1 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-cumin-1 * 09:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-5 * 09:50 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-5 * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-harbor-1 * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-harbor-1 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-107.tools.eqiad1.wikimedia.cloud to the cluster * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-6 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-6 * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetserver-01 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetserver-01 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetdb-2 * 09:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetdb-2 * 09:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:30 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-106.tools.eqiad1.wikimedia.cloud to the cluster * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-legacy-redirector-2 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-legacy-redirector-2 * 09:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-imagebuilder-2 * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-imagebuilder-2 * 09:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-services-05 * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-services-05 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-8 * 09:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-8 * 09:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-checker-5 * 09:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:18 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-105.tools.eqiad1.wikimedia.cloud to the cluster * 09:18 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-checker-5 * 09:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-06-20 === * 13:09 arturo: re-deploy kyverno [[phab:T368044|T368044]] * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-19 === * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:11 arturo: merging k8s HAproxy change https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047113 * 04:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 04:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 04:16 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 04:15 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-14 === * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 07:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 07:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-12 === * 19:41 bd808: Rebuilding all shared Docker containers. This will among other things apply the fix for [[phab:T367345|T367345]]. * 17:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 17:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 16:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 15:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 13:45 taavi: hard reboot tools-k8s-control-7 * 12:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-11 === * 17:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 16:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:50 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all NFS workers * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 11:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:57 dcaro: cleaning old maintain-kubeusers configmaps * 10:45 dcaro: cleaning up old resourcequotas === 2024-06-10 === * 09:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno === 2024-06-07 === * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:09 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-06 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-05 === * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:27 dcaro: deploying toolforge-webservice 0.103.6 * 12:58 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 08:44 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-13 * 08:41 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-12 === 2024-06-04 === * 16:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:47 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-03 === * 16:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:04 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:16 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:14 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 10:13 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 10:13 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:13 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:37 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:37 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 09:37 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:29 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-05-29 === * 16:14 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 02:59 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component envvars-api * 02:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-28 === * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-27 === * 15:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 09:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-25 === * 21:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 21:32 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 20:38 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 20:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-23 === * 13:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-05-22 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 16:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-15 === * 14:17 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-14 === * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 07:48 dcaro: draining tools-k8s-worker-nfs-9 as it's stuck on IO * 07:48 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-9 * 07:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 === 2024-05-07 === * 16:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-06 === * 12:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 12:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 08:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 07:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-05 === * 07:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 07:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-05-03 === * 15:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-30 === * 10:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-26 === * 08:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-25 === * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:48 taavi: update pywikibot script image to v9.1.0 [[phab:T363132|T363132]] === 2024-04-24 === * 15:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-18 === * 09:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-17 === * 20:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 20:48 andrewbogott: In response to stuck processes (NFS?), running sudo cookbook wmcs.toolforge.k8s.reboot --hostname-list tools-k8s-worker-nfs-50 --cluster-name tools * 20:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 15:21 dcaro: swapped login.toolforge.org to point to tools-bastion-13 * 10:48 dcaro: rebooting tools-k8s-worker-nfs-1 === 2024-04-16 === * 11:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.5.0' * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.5.0' === 2024-04-15 === * 20:34 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 20:33 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 18:28 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:27 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 13:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-12 === * 10:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 09:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 09:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 01:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 01:18 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 01:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 01:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 01:15 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 01:14 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 01:13 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 01:12 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 01:11 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-11 === * 08:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-09 === * 17:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 17:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:22 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro: deployed builds-builder 0.0.94 and removed builds-admission * 13:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 13:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 12:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:19 dcaro: deploying toolforge-jobs-cli 16.0.6 === 2024-04-08 === * 16:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:24 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:21 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:09 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 14:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 13:56 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:54 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:52 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 * 13:51 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:45 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:40 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:37 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:31 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:24 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:55 dcaro_: deploy toolforge-jobs-framework-cli 16.0.5 === 2024-04-05 === * 12:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-03 === * 15:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:00 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:59 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:58 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:58 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:57 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:57 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:37 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 11:24 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:24 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 09:45 taavi: rebuilding prebuild images for [[phab:T361457|T361457]] === 2024-04-02 === * 12:39 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-2 ([[phab:T344717|T344717]]) * 12:38 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-2 ([[phab:T344717|T344717]]) * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-05 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-05 === 2024-03-28 === * 14:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-05 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-05 * 13:45 taavi: migrating toolforge.org floating IP from tools-proxy-06 to tools-proxy-7 [[phab:T361223|T361223]] * 13:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:30 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-06 * 12:12 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-06 * 11:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 11:02 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' === 2024-03-27 === * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance toolserver-proxy-01 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance toolserver-proxy-01 === 2024-03-26 === * 16:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:41 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 16:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' * 12:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-bastion' * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-bastion' * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-sgebastion-11 * 12:43 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-sgebastion-11 * 10:24 taavi: point toolserver.org DNS to tools-legacy-redirector-2 [[phab:T311909|T311909]] === 2024-03-25 === * 18:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector * 18:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector * 14:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:27 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud === 2024-03-22 === * 11:43 dcaro: restarted sssd on tools-prometheus-6 as it was stopped (error) === 2024-03-21 === * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-4 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-4 * 15:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-3 * 15:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=99) for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 15:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 12:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 12:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node === 2024-03-20 === * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-checker-04 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-checker-04 * 12:30 taavi: move checker service address to tools-checker-5 * 11:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-checker' * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 10:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 10:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' === 2024-03-19 === * 21:28 taavi: kick off full container image rebuild for https://gerrit.wikimedia.org/r/1012753 (python3 backwards compat in lighttpd images) and https://gerrit.wikimedia.org/r/1010690 (add procps to base images) * 11:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-static-14 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-static-14 * 11:19 taavi: point dev.toolforge.org to tools-bastion-12 [[phab:T314665|T314665]] * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:38 dcaro: pushed docker-registry.tools.wmflabs.org/cloud-cicd-py311bookworm-tox:latest and docker-registry.tools.wmflabs.org/cloud-cicd-debian-builder-bookworm:2024-03-24.1 ([[phab:T360405|T360405]]) === 2024-03-18 === * 13:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 taavi: restart harbor services after docker service restart * 13:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:36 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-filesystemtest-1 * 12:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-filesystemtest-1 * 12:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:29 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:15 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:15 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:11 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:04 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 11:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:23 taavi: point tools-static proxy to tools-static-15 (bookworm) [[phab:T311913|T311913]] * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-api * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:04 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 10:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 09:27 taavi: deleted shutdown grid engine VMs [[phab:T314664|T314664]] === 2024-03-15 === * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-14 === * 17:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'misctools' version '1.48' * 17:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'misctools' version '1.48' * 15:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-imagebuilder-01 * 15:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:10 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 11:02 taavi: stop grid related VMs [[phab:T314664|T314664]] * 11:01 taavi: disable grid access for remaining tools still running on the grid [[phab:T314664|T314664]] === 2024-03-13 === * 19:21 andrewbogott: shutting down old puppet infra: tools-puppetmaster-02 and tools-puppetdb-1. These can be deleted in a week or two presuming everything remains stable. === 2024-03-12 === * 12:38 taavi: hard reboot tools-prometheus-6 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-11 === * 16:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 16:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:20 arturo: cached registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0 as docker-registry.tools.wmflabs.org/kube-state-metrics:v2.6.0 in the docker registry for [[phab:T359798|T359798]] === 2024-03-09 === * 12:48 taavi: hard reboot tools-sgebastion-10 due to stuck NFS procs === 2024-03-08 === * 12:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-07 === * 14:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 13:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-06 === * 10:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_grid_node (exit_code=1) for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:34 taavi: rebuilding all docker images for https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/1005952 ([[phab:T293552|T293552]]) + normal package updates * 09:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 09:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:42 taavi: reboot tools-sgeexec-10-20, -21, -23, sgeweblight-10-32 due to stuck nfs procs === 2024-03-05 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 16:07 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 16:06 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.openstack.quota_increase (exit_code=97) ([[phab:T357901|T357901]]) * 16:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 16:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud === 2024-03-04 === * 17:56 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 17:56 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:43 taavi: reboot tools-sgegrid-shadow due to high number of procs in D state === 2024-03-03 === * 10:38 dcaro: reboot tools-k8s-worker-nfs-55 got nfs lockup (logrotate in D state) === 2024-03-01 === * 21:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 21:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-02-29 === * 14:36 dcaro: deploy webservice 0.103.3 === 2024-02-28 === * 11:57 dcaro: deploy tools-webservice 0.103.2 with probes ([[phab:T341919|T341919]]) * 00:46 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 00:46 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-26 === * 09:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 09:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 09:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:35 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) === 2024-02-23 === * 14:19 taavi: remove isc-dhcp-server (server, not client) from tools-db-2 * 13:32 taavi: remove toolschecker alerts for grid engine jobs [[phab:T358333|T358333]] === 2024-02-22 === * 14:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 14:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:24 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:17 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:03 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 11:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 11:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 11:15 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-104.tools.eqiad1.wikimedia.cloud to the cluster * 11:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 10:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:39 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-8.tools.eqiad1.wikimedia.cloud to the cluster * 09:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 === 2024-02-21 === * 17:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:48 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-control-4 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-control-4 * 09:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:20 taavi@cloudcumin1001: Added a new k8s control tools-k8s-control-7.tools.eqiad1.wikimedia.cloud to the cluster * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster === 2024-02-20 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 16:12 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-103.tools.eqiad1.wikimedia.cloud to the cluster * 16:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 16:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 16:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-101 * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-101 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:48 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-102 * 15:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-102 * 15:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:38 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 15:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 12:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-56.tools.eqiad1.wikimedia.cloud to the cluster * 12:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-100 * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-100 * 12:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:40 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-55.tools.eqiad1.wikimedia.cloud to the cluster * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:29 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-54.tools.eqiad1.wikimedia.cloud to the cluster * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-98 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-98 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-53.tools.eqiad1.wikimedia.cloud to the cluster * 12:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-97 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-97 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-52.tools.eqiad1.wikimedia.cloud to the cluster * 11:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-96 * 11:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-96 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-50.tools.eqiad1.wikimedia.cloud to the cluster * 11:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-49.tools.eqiad1.wikimedia.cloud to the cluster * 11:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-95 * 11:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-95 * 10:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-48.tools.eqiad1.wikimedia.cloud to the cluster * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-92 * 10:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-92 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-6 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-6 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-47.tools.eqiad1.wikimedia.cloud to the cluster * 09:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 09:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-91 * 09:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-91 * 09:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:15 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-46.tools.eqiad1.wikimedia.cloud to the cluster * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:02 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-90 * 08:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-90 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-45.tools.eqiad1.wikimedia.cloud to the cluster * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-44.tools.eqiad1.wikimedia.cloud to the cluster * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-88 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-88 === 2024-02-19 === * 19:04 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 19:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-5 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-5 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-43.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-87 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-87 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-42.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-41.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-85 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-85 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-40.tools.eqiad1.wikimedia.cloud to the cluster * 12:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-84 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-84 * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:04 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-39.tools.eqiad1.wikimedia.cloud to the cluster * 11:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-83 * 11:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-83 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:50 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud to the cluster * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:39 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-37.tools.eqiad1.wikimedia.cloud to the cluster * 11:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-81 * 11:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-81 * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-16 === * 15:28 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 12:21 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-8.tools.eqiad1.wikimedia.cloud to the cluster * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 10:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 10:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:32 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:31 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:59 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-36.tools.eqiad1.wikimedia.cloud to the cluster * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-80 * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-80 * 09:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:45 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-35.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-79 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-79 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-34.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-33.tools.eqiad1.wikimedia.cloud to the cluster * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-77 * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-77 === 2024-02-15 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-4 * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-4 * 13:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:02 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-32.tools.eqiad1.wikimedia.cloud to the cluster * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-76 * 12:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-76 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-31.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-75 * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-75 * 11:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 11:37 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-7.tools.eqiad1.wikimedia.cloud to the cluster * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a ingress role in the tools cluster * 11:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster === 2024-02-14 === * 19:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-30 * 16:35 taavi: kill jobs user 'wikishizhao' is running directly on the grid per https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules #3 * 16:30 taavi: reboot tools-sgeexec-10-23 due to high load * 09:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:07 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-30.tools.eqiad1.wikimedia.cloud to the cluster * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-74 * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-74 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:54 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-29.tools.eqiad1.wikimedia.cloud to the cluster * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-28.tools.eqiad1.wikimedia.cloud to the cluster * 08:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-27.tools.eqiad1.wikimedia.cloud to the cluster * 08:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-71 * 08:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-71 * 08:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-26.tools.eqiad1.wikimedia.cloud to the cluster * 08:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-70 * 08:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-70 * 08:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud to the cluster * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-69 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-69 * 07:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 07:53 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-24.tools.eqiad1.wikimedia.cloud to the cluster * 07:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-68 * 07:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-68 === 2024-02-13 === * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-23.tools.eqiad1.wikimedia.cloud to the cluster * 15:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:30 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-22.tools.eqiad1.wikimedia.cloud to the cluster * 15:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-65 * 15:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-65 * 09:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-21.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-64 * 09:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-64 === 2024-02-12 === * 14:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:58 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-20.tools.eqiad1.wikimedia.cloud to the cluster * 14:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-19.tools.eqiad1.wikimedia.cloud to the cluster * 14:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-61 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-61 * 13:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-60 * 13:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-60 * 13:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-18.tools.eqiad1.wikimedia.cloud to the cluster * 13:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-58 * 13:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-58 * 13:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:22 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-17.tools.eqiad1.wikimedia.cloud to the cluster * 13:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-16.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-54 * 12:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-54 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-15.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-15 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-15 * 12:44 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-52 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-52 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 10:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-11 === * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-02-09 === * 18:03 andrewbogott: updated the default security group, removing the 0.0.0.0/0 rule allowing port 22 access everywhere, replaced it with a 172.16.0.0/21 rule * 13:06 taavi: reboot tools-sgecron-2 due to high load * 10:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component image-config * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component image-config * 09:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-14.tools.eqiad1.wikimedia.cloud to the cluster * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-50 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-50 * 08:56 dcaro: restart tools-k8s-worker-50 due to D some stuck processes === 2024-02-08 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-13.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-48 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-48 * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-12.tools.eqiad1.wikimedia.cloud to the cluster * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-11.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-45 * 09:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-45 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:10 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-10.tools.eqiad1.wikimedia.cloud to the cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-42 * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-42 === 2024-02-07 === * 21:33 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 18:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 17:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 17:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:03 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all workers * 17:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:01 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers === 2024-02-06 === * 13:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes ([[phab:T356507|T356507]]) * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes ([[phab:T356507|T356507]]) === 2024-01-31 === * 14:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-30 === * 19:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-9.tools.eqiad1.wikimedia.cloud to the cluster * 19:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 19:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-8.tools.eqiad1.wikimedia.cloud to the cluster * 19:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 19:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 18:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:46 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-7.tools.eqiad1.wikimedia.cloud to the cluster * 18:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-41 * 18:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-41 * 18:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-40 * 18:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-40 * 18:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-39 * 18:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-39 * 18:18 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-38 * 18:17 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-38 * 18:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-37 * 18:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-37 * 15:16 dcaro: restart harbor now that the db is clean ([[phab:T356037|T356037]]) * 15:14 dcaro: restart harbor now that the db is clean ([[phab:T3543|T3543]]) * 13:08 taavi: create no-op DMARC record [[phab:T354112|T354112]] * 12:39 dcaro: rebuilding all the toolforge images ([[phab:T354320|T354320]]) * 10:16 dcaro: restarting harbor and flushing redis to regenerate cache data ([[phab:T356037|T356037]]) * 09:33 dcaro: cleaning up old schedules on harbor ([[phab:T356037|T356037]]) === 2024-01-29 === * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 14:36 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-mail-4.tools.eqiad1.wikimedia.cloud * 14:34 wmbot~taavi@runko: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-mail-4.tools.eqiad1.wikimedia.cloud * 12:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:06 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-6.tools.eqiad1.wikimedia.cloud to the cluster * 11:55 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-5.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:22 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-4.tools.eqiad1.wikimedia.cloud to the cluster * 11:12 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:12 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-33 * 11:07 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-33 * 11:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-32 * 11:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-32 * 11:01 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-31 * 10:59 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-30 * 10:57 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 10:56 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-3.tools.eqiad1.wikimedia.cloud to the cluster * 10:46 blancadesal: increased harbor quota for wd-shex-infer to 2GiB * 10:44 blancadesal: increased harbor quota for lucaswerkmeister-test to 2GiB * 10:31 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-26 === * 10:56 taavi: copy helmfile_0.144.0-1_all to bookworm-tools, bookworm-toolsbeta === 2024-01-25 === * 13:17 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-24 === * 09:54 dcaro: deploy toolforge-jobs-framework-cli 16.0.1 === 2024-01-23 === * 19:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 19:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:31 taavi: rebooting tools-sgeexec-10-21, tools-sgeexec-10-22 * 12:58 dcaro: deployed toolforge-envvars-cli 0.0.4 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-19 === * 15:40 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-18 === * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-17 === 2024-01-17 === * 18:16 dhinus: increase volume quotas for toolsdb [[phab:T344717|T344717]] * 18:14 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) ([[phab:T344717|T344717]]) * 18:14 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T344717|T344717]]) * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:56 taavi: update all pre-built docker images [[phab:T352886|T352886]] === 2024-01-15 === * 09:18 taavi: reboot stuck tools-k8s-worker-84 === 2024-01-12 === * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-builds-cli' version '0.0.12' * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-builds-cli' version '0.0.12' === 2024-01-11 === * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 15:14 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:13 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-10 === * 22:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 22:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:17 taavi: reboot tools-k8s-worker-98 === 2024-01-09 === * 23:37 andrewbogott: restarting harbor-db in an attempt to reform harbor -- [[phab:T354714|T354714]] * 23:30 andrewbogott: rebooting tools-harbor-1 in a feeble attempt to get it to work (docker-compose can't restart it) * 23:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 23:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 23:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds.builder * 23:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds.builder * 17:31 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:13 taavi: reboot tools-sgeexec-10-17 due to high load === 2024-01-08 === * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-27, tools-sgeweblight-10-28 * 10:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 taavi: reboot tools-sgeexec-10-21 === 2024-01-05 === * 14:55 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:56 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:29 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:29 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-04 === * 10:11 dcaro: deploy toolforge-envvars-cli 0.0.3 === 2024-01-03 === * 21:22 andrewbogott: truncating 200 logfiles to 5M on tools nfs * 21:17 andrewbogott: deleting many stray core dumps throughout nfs storage === 2024-01-02 === * 11:06 dcaro: restart toolsdb database to flush connections ([[phab:T354176|T354176]]) * 10:42 dcaro: flushed the redis db on tools-harbor-1 ([[phab:T354176|T354176]]) * 10:37 dcaro: hard reboot tools-harbor-1 * 10:13 dhinus: hard reboot tools-harbor-1 === 2024-01-01 === * 15:55 andrewbogott: rebooting tools-harbor-1, [[phab:T354151|T354151]] ==Archives== * [[Nova Resource:Tools/SAL/Archive 1|Archive 1]] (2013-2014) * [[Nova Resource:Tools/SAL/Archive 2|Archive 2]] (2015-2017) * [[Nova Resource:Tools/SAL/Archive 3|Archive 3]] (2018-2019) * [[Nova Resource:Tools/SAL/Archive 4|Archive 4]] (2020-2021) * [[Nova Resource:Tools/SAL/Archive 5|Archive 5]] (2022-2023) </noinclude> {{SAL|Project Name=tools}} <noinclude>[[Category:SAL]]</noinclude> 4vbrwl83aytz74706ii374nrq9dndwh 2414306 2414288 2026-05-16T00:00:55Z Stashbot 7414 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers 2414306 wikitext text/x-wiki === 2026-05-16 === * 00:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers === 2026-05-15 === * 19:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 19:02 taavi: rebooting bastions and k8s workers to pick up kernel updates === 2026-05-14 === * 16:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 13:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 13:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component istio-gateway * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-05-13 === * 12:07 godog: resume restarting webservices using default memory requests - [[phab:T420565|T420565]] * 08:46 godog: restart sample webservices with new memory requests https://phabricator.wikimedia.org/P92497 - [[phab:T420565|T420565]] * 08:36 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 08:35 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 00:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 00:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-05-12 === * 23:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 23:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 22:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 22:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 22:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 21:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2026-05-11 === * 00:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component image-config * 00:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 00:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component image-config * 00:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config === 2026-05-07 === * 12:02 taavi: draining tools-k8s-worker-106 to investigate [[phab:T425172|T425172]] === 2026-05-05 === * 04:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 04:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 04:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 02:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 02:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 02:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 01:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-04-28 === * 10:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-04-23 === * 15:59 andrewbogott: hard rebooting tools-puppetserver-01.tools, it seems to have crashed * 09:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node * 09:12 taavi: uninstall ingress-nginx-gen2 from the cluster [[phab:T392356|T392356]] * 08:08 taavi: delete all ingress objects [[phab:T392356|T392356]] === 2026-04-21 === * 14:06 taavi: save backup of all ingress objects to ~taavi/ingresses-backup-2026-04-21.json [[phab:T392356|T392356]] * 13:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 13:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 12:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli === 2026-04-20 === * 15:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2026-04-16 === * 13:09 taavi: bump istio traffic percentage 75% -> 100% [[phab:T392356|T392356]] === 2026-04-15 === * 10:45 taavi: bump istio traffic percentage 50% -> 75% [[phab:T392356|T392356]] === 2026-04-13 === * 09:11 taavi: bump istio traffic percentage 25% -> 50% [[phab:T392356|T392356]] * 07:33 taavi: bump istio traffic percentage 10% -> 25% [[phab:T392356|T392356]] === 2026-04-10 === * 14:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 08:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 08:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 08:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-04-09 === * 14:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 06:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 06:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 06:24 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 06:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 06:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 06:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2026-04-08 === * 17:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 17:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 15:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 00:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api === 2026-04-07 === * 23:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 19:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 19:09 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:59 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 18:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 18:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 18:07 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 18:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:18 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:03 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 17:01 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:59 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:59 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=97) * 16:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:57 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:48 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 16:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 16:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=0) * 15:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 15:52 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 15:51 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:31 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 15:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 15:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:06 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 14:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T361237|T361237]]) * 14:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 14:42 andrewbogott: replacing etcd nodes with bookworm-based VMs * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 13:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 12:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 09:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 08:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 08:57 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 08:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 08:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 07:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 07:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-04-02 === * 17:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 16:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 16:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 16:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 15:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 10:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 10:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-04-01 === * 18:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 18:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 12:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:57 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 09:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-03-31 === * 18:02 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 18:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 17:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 12:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 12:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-03-30 === * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 14:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:05 dcaro: removing wal from prometheus nodes to restart them === 2026-03-26 === * 17:30 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component wmcs-k8s-metrics * 17:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 14:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/cadvisor:0.56.2 * 10:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2026-03-25 === * 13:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-package-builder-04 * 13:43 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-package-builder-04 === 2026-03-24 === * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 17:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-23 === * 11:16 taavi: send 10% of traffic to istio [[phab:T392356|T392356]] * 10:53 taavi: send 5% of traffic to istio [[phab:T392356|T392356]] * 10:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway === 2026-03-19 === * 20:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes * 17:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 16:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 16:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes * 14:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 11:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 09:07 taavi: fixing 2 tools still running ruby2.1 image to use that instead of 'ruby2' in service.manifest * 08:52 taavi: fixing 2 tools still running ruby2.5 image to use that instead of 'ruby25' in service.manifest * 08:49 taavi: fixing 12 tools still running node6 image to use that instead of 'nodejs' in service.manifest * 08:38 taavi: fixing 12 tools still running golang1.11 image to use that instead of 'golang111' in service.manifest * 08:36 taavi: fixing 60 tools still running python3.4 image to use 'python3.4' instead of 'python' in service.manifest === 2026-03-18 === * 12:00 taavi: restarting existing web services to backfill HTTPRoute resources [[phab:T392356|T392356]] * 07:37 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 07:37 filippo@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-83.tools.eqiad1.wikimedia.cloud to the cluster * 07:23 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T419824|T419824]]) === 2026-03-17 === * 12:43 taavi: shutdown tools-package-builder-04 [[phab:T401819|T401819]] === 2026-03-15 === * 03:10 andrewbogott: rebooting tools-redis-6, VM is in state ERROR === 2026-03-13 === * 22:04 taavi: reboot tools-bastion-15 [[phab:T420044|T420044]] * 19:06 taavi: reboot tools-bastion-14 [[phab:T420044|T420044]] === 2026-03-12 === * 13:55 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 13:50 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 === 2026-03-10 === * 11:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 11:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 09:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-09 === * 17:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 17:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 15:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 15:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:30 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-3.tools.eqiad1.wikimedia.cloud to the cluster * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster * 13:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:19 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-2.tools.eqiad1.wikimedia.cloud to the cluster * 13:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster * 13:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a gateway role in the tools cluster * 13:07 taavi@cloudcumin1001: Added a new k8s gateway tools-k8s-gateway-1.tools.eqiad1.wikimedia.cloud to the cluster * 12:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a gateway role in the tools cluster === 2026-03-06 === * 11:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 11:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 11:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-gateway * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-gateway * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component istio-system * 11:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component istio-system === 2026-03-05 === * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api === 2026-03-04 === * 20:10 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:58 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:57 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:46 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:46 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:45 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:44 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 19:30 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:29 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:14 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:14 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 19:13 root@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 19:13 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:12 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:12 root@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=97) * 19:11 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:11 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:10 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:10 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:09 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:08 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:07 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:07 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:06 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:06 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:05 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:04 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:03 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:03 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:02 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:02 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 19:00 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 19:00 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 18:59 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 18:58 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster (exit_code=99) * 18:57 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.add_node_to_cluster * 18:17 root@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 18:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:57 root@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node * 17:38 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 17:18 root@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node * 16:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 14:54 dcaro: increase object quota to 400k ([[phab:T418528|T418528]]) * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 14:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 13:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 13:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api === 2026-03-03 === * 20:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 20:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-03-02 === * 17:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 17:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 16:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-02-26 === * 15:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component gateway-api * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component gateway-api === 2026-02-25 === * 14:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=0) for Istio 1.29.0 * 14:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 * 14:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=99) for Istio 1.29.0 * 14:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 * 14:09 taavi: taavi@tools-imagebuilder-2:~$ sudo docker system prune -a # reclaiming disk space * 14:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry (exit_code=99) for Istio 1.29.0 * 14:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.copy_istio_images_to_registry for Istio 1.29.0 === 2026-02-24 === * 10:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 10:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 10:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-02-20 === * 20:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 20:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-runner:24 * 19:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} * 19:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image toolsbeta-harbor.wmcloud.org/toolforge/heroku-builder:22_{{Gerrit|20240105}} === 2026-02-11 === * 21:49 taavi: remove hiera override still allowing ssh agent forwarding onto toolforge bastions [[phab:T198138|T198138]] === 2026-02-05 === * 19:08 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 18:48 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 16:42 volans: re-enabling puppet on NFS workers to update the infra-tracing-nfs === 2026-02-04 === * 15:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 15:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 14:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.14.3 * 14:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.14.3 === 2026-02-03 === * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 09:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 08:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.7 * 08:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry for image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.7 === 2026-01-28 === * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2026-01-23 === * 01:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 01:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2026-01-22 === * 18:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 18:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2026-01-15 === * 08:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2026-01-14 === * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions (exit_code=0) for tools-bastion-15.tools.eqiad1.wikimedia.cloud, tools-bastion-14.tools.eqiad1.wikimedia.cloud ([[phab:T413797|T413797]]) * 15:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions for tools-bastion-15.tools.eqiad1.wikimedia.cloud, tools-bastion-14.tools.eqiad1.wikimedia.cloud ([[phab:T413797|T413797]]) * 15:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 15:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 15:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 15:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses (exit_code=0) for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T413797|T413797]]) * 14:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T413797|T413797]]) * 14:58 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 14:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:43 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112, tools-k8s-worker-113 ([[phab:T413797|T413797]]) * 14:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.30.14 to 1.31.14 ([[phab:T413797|T413797]]) * 14:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade to 1.31.14 ([[phab:T413797|T413797]]) * 13:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade to 1.31.14 ([[phab:T413797|T413797]]) === 2026-01-12 === * 17:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 17:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 17:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 17:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 17:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 17:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 16:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 16:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2026-01-06 === * 15:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 14:00 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 13:54 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 13:54 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 13:48 andrewbogott: removing tools-k8s-etcd-24 in prep for rebuilding cloudvirtlocal1003 * 13:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 03:28 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 03:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 03:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 02:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 02:53 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 02:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 02:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 02:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:59 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 01:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:48 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 01:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 01:42 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 01:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node ([[phab:T375217|T375217]]) === 2026-01-05 === * 23:17 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 23:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 23:10 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 23:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 23:01 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=97) * 22:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) === 2025-12-18 === * 11:13 godog: bump max objects quota to 200k * 11:05 godog: bump object quota to 500G === 2025-12-17 === * 17:54 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.6.3, Alloy 1.12.1 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.12.1 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.6.3 ([[phab:T399313|T399313]]) * 17:53 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.6.3, Alloy 1.12.1 ([[phab:T399313|T399313]]) === 2025-12-15 === * 13:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 13:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 13:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 13:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component wmcs-k8s-metrics * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 12:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T412695|T412695]]) * 12:01 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/kube-state-metrics:v2.17.0 ([[phab:T412695|T412695]]) * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T412695|T412695]]) === 2025-12-14 === * 02:14 andrewbogott: running 'kubectl rollout restart -n envvars-admission deployment/envvars-admission' in response to an envvars alert === 2025-12-11 === * 16:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-12-04 === * 21:37 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 21:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 21:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:52 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 20:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 20:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 20:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 20:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 20:03 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 19:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 19:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 19:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 19:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 19:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) * 19:13 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 19:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T361237|T361237]]) === 2025-12-03 === * 19:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 19:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T375217|T375217]]) * 17:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=0) * 17:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T375217|T375217]]) * 17:32 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node (exit_code=99) * 17:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.etcd.depool_and_remove_node ([[phab:T375217|T375217]]) === 2025-12-02 === * 20:22 andrewbogott: stop/starting harbordb1 to fix presumed mtu mismatch * 20:06 andrewbogott: rebooting tools-harbordb1 to aid with host draining * 08:31 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Alloy 1.11.3 ([[phab:T399313|T399313]]) * 08:30 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.11.3 ([[phab:T399313|T399313]]) * 08:30 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Alloy 1.11.3 ([[phab:T399313|T399313]]) === 2025-12-01 === * 22:31 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Alloy 1.4.0 ([[phab:T399313|T399313]]) * 22:30 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.4.0 ([[phab:T399313|T399313]]) * 22:30 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Alloy 1.4.0 ([[phab:T399313|T399313]]) * 16:46 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 16:26 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing === 2025-11-27 === * 12:15 volans: [continue] on the haproxy nodes * 12:15 volans: temporarily disabling puppet to deploy gerrit {{Gerrit|1211610}} === 2025-11-26 === * 14:48 volans: enabled infra-tracing-nfs on all nfs workers after testing it on few hosts * 09:46 dhinus: restarting tools-db-6 to apply a config change [[phab:T409922|T409922]] === 2025-11-25 === * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 02:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 02:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 01:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 01:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 01:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 01:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2025-11-24 === * 10:24 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 10:04 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing === 2025-11-20 === * 18:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34 * 18:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34 * 17:13 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component infra-tracing * 16:55 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component infra-tracing * 16:45 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 16:36 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:56 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 15:51 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:47 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:37 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:37 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 15:28 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:23 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 15:17 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:01 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 14:54 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 14:46 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 14:40 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-11-19 === * 16:19 andrewbogott: increased object count quota to 100,000 * 16:03 andrewbogott: increased object storage quota to 200GB * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-11-18 === * 18:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-harbor-2 (cluster eqiad1) * 14:36 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-harbor-2 (cluster eqiad1) * 14:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-prometheus-9 (cluster eqiad1) * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-prometheus-9 (cluster eqiad1) * 14:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-113 (cluster eqiad1) * 14:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-113 (cluster eqiad1) * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-113 * 14:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-113 * 14:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-112 (cluster eqiad1) * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-112 (cluster eqiad1) * 14:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-112 * 14:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-112 * 14:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-82 (cluster eqiad1) * 14:28 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-82 (cluster eqiad1) * 14:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-82 * 14:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-82 * 14:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-81 (cluster eqiad1) * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-81 (cluster eqiad1) * 14:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-81 * 14:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-81 * 14:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-worker-nfs-80 (cluster eqiad1) * 14:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-worker-nfs-80 (cluster eqiad1) * 14:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-80 * 14:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-80 * 14:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-prometheus-8 (cluster eqiad1) * 14:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-prometheus-8 (cluster eqiad1) * 12:05 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:56 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:24 volans@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:19 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-legacy-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-legacy-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.vps.instance.stop_start (exit_code=97) vm tools-legaci-redirector-3 (cluster eqiad1) * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-legaci-redirector-3 (cluster eqiad1) === 2025-11-17 === * 18:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 18:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 18:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 18:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T409981|T409981]]) * 10:06 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T409981|T409981]]) === 2025-11-14 === * 16:27 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-4 ([[phab:T409287|T409287]]) * 16:26 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-4 ([[phab:T409287|T409287]]) === 2025-11-13 === * 15:13 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.instance.stop_start (exit_code=99) vm toolsbeta-test-k8s-ingress-12 (cluster eqiad1) * 15:13 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm toolsbeta-test-k8s-ingress-12 (cluster eqiad1) * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-bastion-14 (cluster eqiad1) * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-bastion-14 (cluster eqiad1) * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-haproxy-7 (cluster eqiad1) * 12:25 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-haproxy-7 (cluster eqiad1) * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.stop_start (exit_code=0) vm tools-k8s-haproxy-8 (cluster eqiad1) * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.instance.stop_start vm tools-k8s-haproxy-8 (cluster eqiad1) === 2025-11-12 === * 15:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 15:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-11-11 === * 15:28 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.5.7 ([[phab:T399313|T399313]]) * 15:28 volans@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.7 ([[phab:T399313|T399313]]) * 15:28 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.7 ([[phab:T399313|T399313]]) === 2025-11-10 === * 22:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 22:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:16 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest * 22:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:14 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest * 22:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) * 22:08 raymond-ndibe@cloudcumin1001: Updating container image toolsbeta-harbor.wmcloud.org/toolforge-pre-built/toolforge-bookworm-sssd:latest:latest * 22:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 22:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 21:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 21:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 21:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 21:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 21:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 20:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 20:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 20:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 19:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 19:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 19:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 19:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 19:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 19:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 19:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 19:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 19:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 19:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 19:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 18:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-11-07 === * 11:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 11:45 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 11:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T409287|T409287]]) * 11:35 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T409287|T409287]]) * 11:34 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-7 ([[phab:T409287|T409287]]) * 11:33 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-7 ([[phab:T409287|T409287]]) === 2025-11-06 === * 16:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-11-05 === * 19:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 18:40 taavi: taavi@tools-bastion-15:~ $ sudo loginctl terminate-user damian * 14:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 14:53 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-7.tools.eqiad1.wikimedia.cloud ([[phab:T409287|T409287]]) * 14:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T409287|T409287]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T409287|T409287]]) === 2025-11-04 === * 17:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 17:26 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 17:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 12:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 03:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 01:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-cli * 01:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli * 01:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-cli * 00:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli * 00:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 00:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 00:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 00:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 00:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 00:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 00:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli === 2025-11-03 === * 22:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 22:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 22:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 22:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 22:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 22:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 22:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 22:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 22:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 22:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 22:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 21:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 21:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 21:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 21:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 21:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 21:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 20:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 20:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 20:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 18:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 18:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 11:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 11:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 11:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2025-10-30 === * 18:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 11:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logs-api * 11:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api * 11:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logs-api === 2025-10-29 === * 18:39 taavi: kick off script to rebuild all pre-built images, including [[phab:T407707|T407707]] * 16:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T408669|T408669]]) * 16:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T408669|T408669]]) * 16:27 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T408669|T408669]]) * 15:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T408669|T408669]]) * 14:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.calico.copy_images_to_registry (exit_code=0) for Calico v3.29.6 * 12:48 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/typha:v3.29.6 * 12:47 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/node:v3.29.6 * 12:47 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/kube-controllers:v3.29.6 * 12:46 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/ctl:v3.29.6 * 12:46 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/calico/cni:v3.29.6 * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.calico.copy_images_to_registry for Calico v3.29.6 * 12:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 12:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 12:37 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 12:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico === 2025-10-28 === * 19:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:07 taavi: delete paws, paws-master security groups, long obsolete as paws is now in a separae project * 16:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 14:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 10:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-10-27 === * 22:09 taavi: copy toolviews database hiera data to a place where haproxy nodes can see them [[phab:T408454|T408454]] * 18:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:16 dcaro: removing taskruns/pipelineruns v1beta1 version from the stored list in the crds ([[phab:T408127|T408127]]) === 2025-10-24 === * 20:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-35, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-44, t * 18:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-35, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-41, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-44, tools-k8s-worker-nfs * 18:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-27 * 17:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-27 * 17:36 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-9, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-15, tools-k * 16:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-9, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-16, to * 16:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-105, tools-k8s-worker-106 * 16:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-105, tools-k8s-worker-106 * 16:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 16:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 16:19 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-102,tools-k8s-worker-103 * 16:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-102,tools-k8s-worker-103 * 13:37 andrewbogott: rebooting clouddumps100[12] for [[phab:T407110|T407110]] === 2025-10-23 === * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:11 taavi: deleting old nginx front proxy instances [[phab:T283948|T283948]] * 10:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-10-22 === * 15:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 15:56 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/nginx-ingress-controller:v1.13.3 * 15:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 15:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 12:35 taavi: moving toolforge traffic to haproxy directly [[phab:T283948|T283948]] * 07:00 godog: delete tools-nfs-2 - [[phab:T404584|T404584]] === 2025-10-21 === * 08:53 godog: shut down tools-nfs-2 - [[phab:T404584|T404584]] * 07:52 godog: tools-nfs-3 is back - [[phab:T404584|T404584]] * 07:49 godog: resize tools-nfs-3 to match tools-nfs-2 (g4.cores16.ram64.disk20.10xiops) - [[phab:T404584|T404584]] * 00:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 00:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-10-20 === * 16:31 taavi: make logrotate run hourly on haproxy nodes [[phab:T284558|T284558]] === 2025-10-16 === * 12:01 volans@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:52 volans@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 08:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-10-15 === * 08:03 godog: tools-nfs-3 is back - [[phab:T404584|T404584]] * 08:00 godog: resize tools-nfs-3 - [[phab:T404584|T404584]] === 2025-10-14 === * 14:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 11:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:45 godog: update nfs-tools.wmcloud.org and nfs.svc.toolforge.org proxied to point to tools-nfs-3 === 2025-10-13 === * 14:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:17 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-72, tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-76, tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-8, tools-k8s-worker-nfs-80, tools-k8s-worker-nfs-81, tools-k8s-worker-nfs-82, too * 09:14 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:14 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:14 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-81 (cluster eqiad1, project tools) * 09:14 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-81 (cluster eqiad1, project tools) * 09:13 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:13 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:09 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-69 (cluster eqiad1, project tools) * 09:09 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-69 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-68 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-68 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-67 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-67 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-66 (cluster eqiad1, project tools) * 09:08 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-66 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-65 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-65 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-61 (cluster eqiad1, project tools) * 09:07 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-61 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-58 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-58 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-57 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-57 (cluster eqiad1, project tools) * 09:06 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-55 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-55 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-54 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-54 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-53 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-53 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-50 (cluster eqiad1, project tools) * 09:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-50 (cluster eqiad1, project tools) * 09:03 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-5 (cluster eqiad1, project tools) * 09:03 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-5 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-48 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-48 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-47 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-47 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-46 (cluster eqiad1, project tools) * 09:02 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-46 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-45 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-45 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-44 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-44 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-43 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-43 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-42 (cluster eqiad1, project tools) * 09:01 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-42 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-41 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-41 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-40 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-40 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-39 (cluster eqiad1, project tools) * 09:00 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-39 (cluster eqiad1, project tools) * 08:59 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-38 (cluster eqiad1, project tools) * 08:59 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-38 (cluster eqiad1, project tools) * 08:58 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:58 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:57 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:57 filippo@cloudcumin1001: START - Cookbook wmcs.vps.instance.force_reboot vm tools-k8s-worker-nfs-37 (cluster eqiad1, project tools) * 08:10 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:05 wmbot~godog@r5: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:05 wmbot~godog@r5: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:04 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.migrate_service (exit_code=99) ([[phab:T404584|T404584]]) * 08:03 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.migrate_service ([[phab:T404584|T404584]]) * 08:01 godog: switch NFS from tools-nfs-2 to tools-nfs-3 - [[phab:T404584|T404584]] * 07:29 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-76, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-81 === 2025-10-10 === * 09:22 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-1 * 09:10 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-1 === 2025-10-09 === * 08:21 filippo@cloudcumin1001: END (FAIL) - Cookbook wmcs.nfs.add_server (exit_code=99) * 08:15 filippo@cloudcumin1001: START - Cookbook wmcs.nfs.add_server === 2025-10-08 === * 21:47 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-76 * 21:19 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-76 * 12:27 godog: very brief nfs interruption to wrap up [[phab:T347681|T347681]] * 10:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 08:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 06:55 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-71 * 06:43 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-71 === 2025-10-07 === * 18:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-11 * 16:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-11 * 15:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-69 * 15:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-69 * 14:51 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-65, tools-k8s-worker-nfs-69 * 14:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-65, tools-k8s-worker-nfs-69 * 13:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:59 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 12:58 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 * 12:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 11:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 10:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 08:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 08:08 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 08:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-10-06 === * 12:06 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-7 * 11:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-7 * 08:19 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 08:19 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 08:18 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-76 * 07:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-36, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-76 === 2025-10-03 === * 12:51 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 * 12:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-6 * 12:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-6 * 12:45 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 * 09:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-5 * 09:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-5 === 2025-10-02 === * 13:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 13:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 13:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-haproxy-7 * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-haproxy-7 * 13:23 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=99) * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 09:12 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 * 08:52 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-42, tools-k8s-worker-nfs-55 === 2025-10-01 === * 10:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-09-30 === * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 08:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 08:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 08:19 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-67 * 08:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-67 === 2025-09-29 === * 13:23 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 13:23 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 11:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:34 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 10:34 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-79 * 07:00 godog: kick stuck nfs workers from clouddumps1001 === 2025-09-28 === * 08:54 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:35 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-23, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-7, tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:15 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:13 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T405850|T405850]]) * 08:12 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T405850|T405850]]) * 08:10 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-1 ([[phab:T405850|T405850]]) * 08:10 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:08 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:08 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:08 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:07 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 08:07 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-09-25 === * 18:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:27 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 12:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 12:04 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 11:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-09-24 === * 20:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 20:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 17:30 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-43 * 17:28 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-43 * 17:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:10 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-nfs-43 * 17:07 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-nfs-43 * 16:57 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73 ([[phab:T400957|T400957]]) * 16:50 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73 ([[phab:T400957|T400957]]) * 13:49 dcaro: patched all tools with new resource defaults, everything looks good * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:09 dcaro: depolyed jobs-api change to default resources, patching existing jobs * 13:08 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-cli * 13:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:36 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:28 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 12:11 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 12:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 03:54 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 * 03:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 === 2025-09-23 === * 20:08 andrewbogott: creating puppetdbpostgres and adding it to tools-puppetdb-2 to store postgres data; the root volume of that VM was filling up and causing widespread puppet issues * 01:55 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 01:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 === 2025-09-22 === * 16:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-09-21 === * 09:17 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-2 * 09:02 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-2 * 03:16 dcaro: acking and silencing CPU capacity alerts to handle on Monday, they should not page * 01:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 01:46 andrew@cloudcumin1001: Added a new k8s worker tools-k8s-worker-113.tools.eqiad1.wikimedia.cloud to the cluster * 01:36 andrewbogott: adding additional worker node in response to repeated capacity alerts * 01:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2025-09-19 === * 13:09 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-11 * 13:03 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-11 === 2025-09-18 === * 13:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 11:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:45 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 11:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 11:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-prometheus-9 (cluster eqiad1, project tools) * 11:29 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-prometheus-9 (cluster eqiad1, project tools) * 11:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:42 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:36 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 09:34 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 09:34 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 08:52 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-3 * 08:34 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-3 * 06:47 taavi: delete tools-sgebastion-10 [[phab:T314665|T314665]] === 2025-09-17 === * 13:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-32 * 12:53 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-32 * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:35 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) vm tools-prometheus-9 (cluster eqiad1, project tools) * 09:35 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot vm tools-prometheus-9 (cluster eqiad1, project tools) * 09:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:34 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 08:23 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-10 * 08:08 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-66, tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-10 === 2025-09-16 === * 16:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 16:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 15:57 taavi: delete tools-sgebastion puppet prefix [[phab:T314665|T314665]] * 15:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:26 taavi: shutdown tools-sgebastion-10 [[phab:T314665|T314665]] * 14:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-13 * 14:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-13 * 14:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 13:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-bastion-12 * 13:28 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-bastion-12 * 07:11 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75 * 06:57 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-71, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-75 * 06:57 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 06:51 filippo@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-09-15 === * 16:22 taavi: reboot old bastions to kick long-living connections into newer ones * 14:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:09 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 14:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 12:47 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-66 * 12:35 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-66 === 2025-09-12 === * 08:49 taavi: pointing login.toolforge.org to tools-bastion-15 [[phab:T392510|T392510]] * 08:33 taavi: pointing dev.toolforge.org to tools-bastion-14 [[phab:T392510|T392510]] * 07:14 godog: uncordon tools-k8s-worker-nfs-53 after failed cookbook (?) yesterday === 2025-09-11 === * 14:42 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 14:36 godog: drain/reboot tools-k8s-worker-nfs-46 - [[phab:T404322|T404322]] * 14:36 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 14:22 andrewbogott: actually I didn't drain tools-k8s-worker-nfs-53 because the alert cleared on its own * 14:21 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-53 * 14:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 14:21 andrewbogott: draining/rebooting tools-k8s-worker-nfs-53 because of procs in D state * 13:42 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-53 * 13:36 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-14.tools.eqiad1.wikimedia.cloud * 07:59 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-14.tools.eqiad1.wikimedia.cloud * 07:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-15.tools.eqiad1.wikimedia.cloud * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-15.tools.eqiad1.wikimedia.cloud * 07:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase === 2025-09-10 === * 14:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 14:28 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 14:26 dcaro@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:10 dcaro@cloudcumin1001: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:08 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:45 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:31 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:31 fnegri@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-kubeusers ([[phab:T403964|T403964]]) * 12:30 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T403964|T403964]]) === 2025-09-09 === * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 08:55 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.vps.instance.force_reboot (exit_code=0) ([[phab:T404047|T404047]]) * 08:55 wmbot~dcaro@acme: START - Cookbook wmcs.vps.instance.force_reboot ([[phab:T404047|T404047]]) === 2025-09-08 === * 15:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 15:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 15:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:16 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.run_tests (exit_code=97) * 12:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 12:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 11:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 11:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 11:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions (exit_code=0) for tools-bastion-12.tools.eqiad1.wikimedia.cloud, tools-bastion-13.tools.eqiad1.wikimedia.cloud ([[phab:T402378|T402378]]) * 11:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_bastions for tools-bastion-12.tools.eqiad1.wikimedia.cloud, tools-bastion-13.tools.eqiad1.wikimedia.cloud ([[phab:T402378|T402378]]) * 11:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses (exit_code=0) for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T402378|T402378]]) * 11:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_ingresses for tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 ([[phab:T402378|T402378]]) * 11:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 10:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 10:20 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 10:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:06 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 10:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 10:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 ([[phab:T402378|T402378]]) * 10:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 10:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 09:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 ([[phab:T402378|T402378]]) * 09:58 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-68: ([[phab:T402378|T402378]]) * 09:58 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68: ([[phab:T402378|T402378]]) * 09:55 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:50 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:46 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wor * 09:42 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:40 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=97) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-w * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:38 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:37 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 09:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-wo * 09:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-10, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-13, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-23, tools-k * 09:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=0) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 09:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 09:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 09:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 08:58 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers (exit_code=99) for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade_workers for tools-k8s-worker-102, tools-k8s-worker-103, tools-k8s-worker-105, tools-k8s-worker-106, tools-k8s-worker-107, tools-k8s-worker-108, tools-k8s-worker-109, tools-k8s-worker-110, tools-k8s-worker-111, tools-k8s-worker-112 ([[phab:T402378|T402378]]) * 08:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:40 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:32 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:19 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.29.15 to 1.30.14 ([[phab:T402378|T402378]]) * 08:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:09 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.run_tests (exit_code=99) * 08:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 08:09 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.run_tests (exit_code=97) * 08:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests === 2025-09-06 === * 23:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 * 23:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 === 2025-09-05 === * 14:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-09-04 === * 19:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 18:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:52 dcaro: added 'disable-ssl' to tools replica.my.cnf === 2025-09-03 === * 17:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:02 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:09 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 13:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 12:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 12:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 11:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 11:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 10:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 09:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 08:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 08:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 08:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-09-02 === * 17:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 17:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 16:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 15:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-29 === * 15:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance abogott-nstesting * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance abogott-nstesting === 2025-08-28 === * 16:52 taavi: rebuild tcl, mariadb images on top of trixie [[phab:T400256|T400256]] * 08:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 08:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-08-27 === * 18:05 taavi: copy missing aptly packages to trixie-<nowiki>{</nowiki>tools,toolsbeta<nowiki>}</nowiki> * 11:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 11:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-08-26 === * 13:42 dcaro: extended object storage quota to 100G ([[phab:T402923|T402923]]) * 10:25 dhinus: shut down tools-harbor-1 (no longer used) === 2025-08-25 === * 22:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-81 * 22:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-81 === 2025-08-21 === * 12:28 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 10:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:31 godog: reboot nfs workers to reset processes stuck in D state * 07:28 filippo@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 04:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-80 * 03:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-80 === 2025-08-20 === * 08:09 dcaro: deploy wmcs-k8s-metrics upgrade ([[phab:T362869|T362869]]) === 2025-08-19 === * 15:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 15:08 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-harbor * 15:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 15:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 14:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:50 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component builds-api * 14:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 14:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:46 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:42 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 14:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:37 dcaro: flipped the tools-harbor.wmcloud.org endpoint to point to tools-harbor-2 ([[phab:T350687|T350687]]) * 14:22 Raymond_Ndibe: setting tools-harbor-1 as read-only ([[phab:T350687|T350687]]) * 13:24 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 13:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:21 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 13:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-18 === * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362869|T362869]]) * 17:49 dcaro@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/kube-state-metrics:v2.16.0 ([[phab:T362869|T362869]]) * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/metrics-server:v0.7.2 ([[phab:T362869|T362869]]) * 17:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362869|T362869]]) * 17:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 17:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud ([[phab:T350687|T350687]]) * 16:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud ([[phab:T350687|T350687]]) * 16:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:35 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:34 wmbot~dcaro@acme: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud === 2025-08-16 === * 21:16 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-111 * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-111 * 21:13 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-111 * 21:13 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-111 === 2025-08-15 === * 19:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 19:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 19:21 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 * 19:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-103 === 2025-08-14 === * 15:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 15:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:38 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=99) for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 * 11:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-107, tools-k8s-worker-nfs-11, tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-3, tools-k8s-worker-nfs-41 * 11:33 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) no stuck workers found * 11:33 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers no stuck workers found * 02:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 02:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-13 === * 16:36 dcaro: reverting jobs-api release ([[phab:T401846|T401846]]) * 11:18 taavi: delete tools-prometheus-6, shutdown for a while * 08:51 godog: bounce stashbot * 08:33 godog: refresh machine-id on tools-k8s-worker-[102-103,105-112].tools.eqiad1.wikimedia.cloud,tools-k8s-worker-nfs-[1-3,5,7-14,16-17,19,21-24,26-27,32-48,50,53-55 ,57-58,61,65-82].tools.eqiad1.wikimedia.cloud === 2025-08-12 === * 16:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component image-config * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component image-config * 15:34 taavi: building initial trixie based images [[phab:T400255|T400255]] * 12:50 dcaro: redepoly kyverno ([[phab:T394787|T394787]]) * 12:49 dcaro: manually migrate cleanuppolicies.kyverno.io and clustercleanuppolicies.kyverno.io (using kyverno cli) ([[phab:T394787|T394787]]) * 10:01 dcaro: starting upgrade for kyverno ([[phab:T394787|T394787]]) * 10:00 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:53 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:52 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 09:52 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot_stuck_workers for tools-k8s-worker-103, tools-k8s-worker-nfs-36 * 03:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli === 2025-08-11 === * 12:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 12:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:44 dcaro@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud * 08:37 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 * 08:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-58 === 2025-08-08 === * 06:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 06:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-07 === * 14:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67 * 14:20 andrewbogott: draining and rebooting tools-k8s-worker-nfs-67 * 14:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 * 10:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-06 === * 17:54 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.run_tests (exit_code=0) * 17:53 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.run_tests * 17:41 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli ([[phab:T401014|T401014]]) * 17:41 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli ([[phab:T401014|T401014]]) * 17:39 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component webservice-cli ([[phab:T401014|T401014]]) * 17:38 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli ([[phab:T401014|T401014]]) * 17:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-08-05 === * 16:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 16:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component webservice-cli * 13:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component webservice-cli * 11:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-24 * 11:16 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-82, tools-k8s-worker-nfs-24 * 09:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 03:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 03:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 03:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 02:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 02:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 02:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 02:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 02:36 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 02:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:34 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 02:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component maintain-harbor * 02:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 02:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 02:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 02:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 02:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 02:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 01:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 01:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 01:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 01:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 01:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 01:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 00:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-08-04 === * 13:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:05 filippo@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'filippo' in role 'member' ([[phab:T401091|T401091]]) * 13:05 filippo@cloudcumin1001: START - Cookbook wmcs.vps.add_user_to_project for user 'filippo' in role 'member' ([[phab:T401091|T401091]]) * 11:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-2 * 11:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-79, tools-k8s-worker-nfs-2 === 2025-08-01 === * 03:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-cli === 2025-07-31 === * 16:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 16:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 04:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 04:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 04:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 04:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 04:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli * 04:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 04:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 04:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2025-07-30 === * 08:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-07-29 === * 16:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 15:56 andrewbogott: draining and restarting tools-k8s-worker-nfs-74 * 15:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 * 15:44 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 * 15:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58, tools-k8s-worker-nfs-32 * 15:32 andrewbogott: draining and restarting tools-k8s-worker-nfs-58 and tools-k8s-worker-nfs-32 * 14:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:16 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:16 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-82.tools.eqiad1.wikimedia.cloud to the cluster * 13:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:06 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 13:06 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:05 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-81.tools.eqiad1.wikimedia.cloud to the cluster * 12:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:53 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:53 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:46 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:40 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:40 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 12:40 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.quota_increase * 12:39 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:31 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:31 wmbot~dcaro@acme: Added a new k8s worker-nfs tools-k8s-worker-nfs-80.tools.eqiad1.wikimedia.cloud to the cluster * 12:22 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:22 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:18 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:00 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:54 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=97) * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:29 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:29 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:28 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:28 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:07 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:02 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:02 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:01 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 08:59 wmbot~dcaro@acme: Added a new k8s worker tools-k8s-worker-112.tools.eqiad1.wikimedia.cloud to the cluster * 08:49 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:49 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:49 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 08:15 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:14 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2025-07-28 === * 20:28 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:25 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:24 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:23 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 20:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:49 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 19:44 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 19:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 11:58 taavi: update pywikibot image to 10.2.0 [[phab:T396933|T396933]] === 2025-07-26 === * 07:16 wmbot~root@toolforge: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-cli * 07:16 wmbot~root@toolforge: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli === 2025-07-23 === * 18:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 18:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-07-21 === * 17:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 * 17:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 === 2025-07-19 === * 13:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 13:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 === 2025-07-18 === * 10:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-44 * 10:34 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-44 === 2025-07-14 === * 12:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-78 * 12:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-78 === 2025-07-13 === * 03:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 * 03:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73, tools-k8s-worker-nfs-24, tools-k8s-worker-nfs-12 === 2025-07-11 === * 17:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-37 * 09:25 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-77, tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-37 === 2025-07-09 === * 17:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 14:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 10:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component misctools-cli * 10:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component misctools-cli * 09:55 dcaro: adding arch arm64 to all toolforge repos ([[phab:T398016|T398016]]) * 09:40 dcaro: added arch arm64 to jessie-tools repo ([[phab:T398016|T398016]]) === 2025-07-08 === * 17:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 15:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:42 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-07 === * 17:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-53 * 17:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-53 * 16:39 dcaro: pushed new ci image docker-registry.svc.toolforge.org/cloud-cicd-py3.11-bookworm-tox:latest * 16:05 dcaro: clearing images from tools-imagebuilder-2 as it's out of space * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 11:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 08:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 08:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-06 === * 16:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-8 * 16:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75, tools-k8s-worker-nfs-8 === 2025-07-05 === * 00:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-57 * 00:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-57 * 00:31 andrewbogott: restarting tools-k8s-worker-nfs-55 tools-k8s-worker-nfs-47 tools-k8s-worker-nfs-57, too many D state procs === 2025-07-04 === * 14:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-24 * 14:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12, tools-k8s-worker-nfs-24 * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 === 2025-07-03 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 14:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 13:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 13:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 13:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 13:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 10:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 08:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 08:26 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 08:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging === 2025-07-02 === * 13:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-55 * 13:30 andrewbogott: restarting stuck tools tools-k8s-worker-nfs-74 tools-k8s-worker-nfs-39 tools-k8s-worker-nfs-55 * 13:30 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-55 * 10:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 10:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 10:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 09:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:16 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 09:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-07-01 === * 16:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 16:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 15:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 15:41 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component logging * 15:23 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 15:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 15:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 15:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component logging * 15:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component logging * 14:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:31 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-5 ([[phab:T398170|T398170]]) * 14:30 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-5 ([[phab:T398170|T398170]]) * 14:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 14:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 13:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 13:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 13:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:03 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 11:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 11:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 10:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 10:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 09:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder === 2025-06-30 === * 23:01 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 * 22:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-14 * 13:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 * 13:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69, tools-k8s-worker-nfs-70 * 10:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:47 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T398170|T398170]]) * 10:46 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T398170|T398170]]) * 10:46 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T398170|T398170]]) * 10:45 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T398170|T398170]]) * 10:44 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T398170|T398170]]) * 10:43 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T398170|T398170]]) === 2025-06-28 === * 10:39 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-43, tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67,tools-k8s-worker-nfs-43,tools-k8s-worker-nfs-22,tools-k8s-worker-nfs-5,tools-k8s-worker-nfs-24 * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67,tools-k8s-worker-nfs-43,tools-k8s-worker-nfs-22,tools-k8s-worker-nfs-5,tools-k8s-worker-nfs-24 * 10:12 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19,tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-67 * 10:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 * 10:08 dcaro: left a tmux running with a script to restart nginx if stuck * 09:59 dcaro: restarted nginx in tools-static === 2025-06-27 === * 18:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-46 * 17:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-46 === 2025-06-26 === * 16:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 16:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 14:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 12:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-06-25 === * 18:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 18:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 17:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 17:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:52 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 13:50 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 11:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-cli * 11:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-cli * 02:18 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 * 02:07 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-38 === 2025-06-24 === * 16:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-33 * 15:06 andrewbogott: rebooting tools-k8s-worker-nfs-33, stuck processes * 15:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33 * 15:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 12:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:22 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 12:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2025-06-23 === * 09:08 taavi: restrict logging in to tools-sgebastion-10 (aka login-buster) [[phab:T397459|T397459]] === 2025-06-22 === * 00:09 andrewbogott: rebooting tools-prometheus-8 === 2025-06-21 === * 16:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-12 * 15:58 andrewbogott: rebooting tools-k8s-worker-nfs-54 tools-k8s-worker-nfs-12, lots of D state * 15:57 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-12 * 10:09 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:27 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:27 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 09:26 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-06-19 === * 18:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 17:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 17:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 13:56 dcaro: reboot tools-sgebastion-10 as it's stuck on NFS for some tools === 2025-06-18 === * 14:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 14:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 04:22 andrewbogott: rebooting tools-prometheus-8; unreachable === 2025-06-16 === * 17:41 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 17:38 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 12:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 12:39 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 === 2025-06-14 === * 16:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 16:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 === 2025-06-12 === * 10:36 dcaro: rebooting tools-prometheus-8 due to the VM having load issues (not responding to ssh) * 10:34 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 10:28 wmbot~dcaro@acme: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2025-06-11 === * 13:39 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 13:33 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=0) for Loki 3.5.0, Alloy 1.9.1 * 11:18 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/alloy:v1.9.1 * 11:18 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.0 * 11:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.0, Alloy 1.9.1 * 11:09 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry (exit_code=99) for Loki 3.5.0, Alloy 1.9.1 * 11:09 taavi@cloudcumin1001: Updating container image docker-registry.svc.toolforge.org/grafana/loki:3.5.0 * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.logging.copy_images_to_registry for Loki 3.5.0, Alloy 1.9.1 === 2025-06-10 === * 17:04 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 17:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 16:41 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 16:28 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 16:26 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 16:21 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 15:45 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:33 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:21 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 15:15 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:57 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 11:48 taavi: add AAAA records to tools/toolsbeta-harbor proxies, previous monitoring issues resolved === 2025-06-06 === * 21:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-74 * 21:40 andrewbogott: restarting tools-prometheus-9 and tools-prometheus-8, lots of tools metrics just went dark * 21:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-74 * 18:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 18:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 15:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 15:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2025-06-05 === * 22:24 andrewbogott: running /srv/tools/cleanup.sh on tools-nfs-2 in a screen session, trying to clear disk space alert * 15:06 chuckonwumelu@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:53 chuckonwumelu@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-05-30 === * 16:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 15:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-11 * 15:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 15:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 15:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-11 * 15:28 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component components-api * 15:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 07:38 taavi: reboot tools-static-15 to unstuck NFS things === 2025-05-24 === * 12:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-65 * 12:50 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-65 === 2025-05-23 === * 16:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 * 16:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-65 * 03:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 * 02:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54, tools-k8s-worker-nfs-37, tools-k8s-worker-nfs-43 === 2025-05-22 === * 21:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 21:17 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-45, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-55 * 20:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-45, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-55 * 20:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 19:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 19:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-53, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-21 * 19:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 19:26 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 19:15 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2, tools-k8s-worker-nfs-53, tools-k8s-worker-nfs-47, tools-k8s-worker-nfs-78, tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-1, tools-k8s-worker-nfs-21 * 19:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 18:15 dcaro: restart tools-static nginx due to nfs hiccup * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-8 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-8 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-7 * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-7 * 07:58 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=1) for instance toolsbeta-prometheus-1 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance toolsbeta-prometheus-1 * 07:33 taavi: add AAAA record on *.toolforge.org [[phab:T211575|T211575]] === 2025-05-21 === * 15:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-10.tools.eqiad1.wikimedia.cloud * 15:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-proxy-9.tools.eqiad1.wikimedia.cloud * 15:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-10.tools.eqiad1.wikimedia.cloud * 15:24 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-proxy-9.tools.eqiad1.wikimedia.cloud * 13:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 13:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 09:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-9.tools.eqiad1.wikimedia.cloud * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-9.tools.eqiad1.wikimedia.cloud * 09:27 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/busybox:1.35 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/bitnami-kubectl:1.30.2 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-reports-controller:v1.13.6 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 09:26 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-background-controller:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyvernopre:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 09:25 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 * 09:25 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:04 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 09:04 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 09:03 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 09:03 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 08:54 dcaro: deployed the new dns entry for docker-registry.svc.toolforge.org (might take some time to refresh) * 08:47 dcaro: deleting docker-registry.svc.toolforge.org proxy to use dns entry to floating ip instead === 2025-05-20 === * 19:40 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 19:40 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 19:39 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 19:39 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 17:18 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 17:18 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 17:18 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 17:17 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 17:16 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 17:16 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 16:11 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 16:11 wmbot~dcaro@acme: Updating container image docker-registry.svc.toolforge.org/toolforge-kyverno-kyverno:v1.13.6 * 16:11 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 15:48 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 15:48 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/busybox:1.35 * 15:47 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.30.2 * 15:46 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports:v1.13.6 * 15:46 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup:v1.13.6 * 15:45 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background:v1.13.6 * 15:45 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.13.6 * 15:44 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.13.6 * 15:44 wmbot~dcaro@acme: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.13.6 * 15:44 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 15:01 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 15:00 wmbot~dcaro@acme: Updating container image toolforge-kyverno-kyverno:v1.13.6 * 15:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 14:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:59 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 14:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 14:58 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=97) * 14:58 wmbot~dcaro@acme: Updating container image toolforge-kyverno-kyverno:v1.13.6 * 14:58 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 13:57 taavi: disable host-based authentication in sshd config, not used since grid shutdown * 13:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 * 13:07 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 * 13:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-prometheus-7 * 13:05 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-prometheus-7 * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-prometheus-8.tools.eqiad1.wikimedia.cloud * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-prometheus-8.tools.eqiad1.wikimedia.cloud * 09:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase === 2025-05-19 === * 08:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 08:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2025-05-16 === * 18:58 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-9 * 17:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67, tools-k8s-worker-nfs-9 * 16:51 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor ([[phab:T394520|T394520]]) * 16:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T394520|T394520]]) * 16:44 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 16:44 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 16:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 16:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:08 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 12:07 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2025-05-14 === * 17:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 17:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 14:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 08:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2025-05-13 === * 15:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 15:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 07:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-36 * 07:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 === 2025-05-12 === * 19:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 19:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-cli * 16:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-cli * 13:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:23 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:04 arturo: add container image to docker registry docker-registry.tools.wmflabs.org/tofu-provisioning:20250512 ([[phab:T393686|T393686]]) * 11:51 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 11:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 11:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:32 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 10:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 09:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 09:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 08:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 08:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 02:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19 * 02:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19 === 2025-05-10 === * 17:35 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-sudo<nowiki>{</nowiki>,.socket<nowiki>}</nowiki> # looks like the reset-failed didnโ€™t work properly, systemd didnโ€™t even try to start the service again afaict ([[phab:T393732|T393732]]) * 17:34 lucaswerkmeister: root@tools-bastion-13:~# systemctl reset-failed sssd-<nowiki>{</nowiki>pam,sudo<nowiki>}</nowiki>.service && systemctl restart sssd-pam<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>.socket # try to reset the rate limits this way ([[phab:T393732|T393732]]) * 16:22 lucaswerkmeister: systemctl restart sssd-<nowiki>{</nowiki>pam<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>,sudo<nowiki>}</nowiki>.socket # service-start-limit-hit, [[phab:T393732|T393732]]? * 14:10 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # service-start-limit-hit, [[phab:T393732|T393732]]? * 11:53 lucaswerkmeister: [[phab:T393732|T393732]] note: restart of sssd-pam.service actually failed, โ€œmay be requested by dependency onlyโ€; overall it still seems to have worked though (so next time restarting the sockets is probably sufficient) * 11:52 lucaswerkmeister: root@tools-bastion-13:~# systemctl restart sssd-pam<nowiki>{</nowiki>,<nowiki>{</nowiki>,-priv<nowiki>}</nowiki>.socket<nowiki>}</nowiki> # all three failed with start-limit-hit / Start request repeated too quickly; [[phab:T393732|T393732]]? === 2025-05-09 === * 12:31 arturo: hard-reboot tools-bastion-13 (login.toolforge.org) because unresponsive (out of memory) -- previous reboot was for tools-bastion-12 (dev.t.o) by mistake * 12:29 arturo: hard-reboot tools-bastion-12 (login.toolforge.org) because unresponsive (out of memory) * 07:10 taavi: kill bunch of unwanted processes off of tools-bastion-13 [[phab:T393732|T393732]], please run your things as jobs === 2025-05-08 === * 17:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 17:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 17:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 17:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 17:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 16:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:48 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 16:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 16:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 16:46 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-admission * 16:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:24 taavi: root@tools-bastion-13:~# systemctl restart sssd-sudo.socket # was in failed state * 08:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-05-07 === * 18:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector-2 * 17:58 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector-2 * 16:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 12:58 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 12:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 12:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 11:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 10:36 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 10:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 09:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:40 taavi: remove 'roots' ldap sudo policy [[phab:T392797|T392797]] * 09:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:33 dcaro: released jobs-cli 16.1.12 * 09:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 09:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-05-06 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 16:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 15:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:24 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 15:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:21 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 13:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:55 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 12:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69 * 12:10 dcaro: rebooting tools-k8s-worker-nfs-69 due to some stuck processes * 12:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69 === 2025-05-04 === * 11:12 dcaro: deleting tools-services-05, has been off for a year (replaced with 06) === 2025-05-02 === * 18:37 taavi: add elasticsearch credential for tools.techcontribs [[phab:T393209|T393209]] * 13:55 taavi: reboot tools-static-15 === 2025-04-28 === * 13:07 dhinus: tools-db-4: systemctl stop mariadb && systemctl start mariadb [[phab:T392596|T392596]] * 13:06 dhinus: tools-db-5: systemctl stop mariadb && systemctl start mariadb [[phab:T392596|T392596]] * 13:05 dhinus: tools-db-5: systemctl stop mariadb && systemctl start mariadb [[phab:T318479|T318479]] === 2025-04-24 === * 23:09 bd808: `systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl restart sssd` on tools-bastion-12 * 23:03 bd808: `sss_cache -E` on tools-bastion-12 after seeing "sudo: PAM account management error: Authentication service cannot retrieve authentication info" * 18:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 18:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 18:38 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli * 18:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 18:32 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-cli * 18:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 11:51 taavi: add missing ICMPv6 security group rule to 'default' group * 08:02 taavi: add an AAAA record for toolserver.org [[phab:T392506|T392506]] === 2025-04-23 === * 19:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 * 19:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 * 15:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-3.tools.eqiad1.wikimedia.cloud * 15:55 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-3.tools.eqiad1.wikimedia.cloud * 15:10 arturo: give `tools-tofu` bot account member powers for https://gitlab.wikimedia.org/repos/cloud/toolforge/tofu-provisioning * 13:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 11:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:02 taavi: rebooting tools-mail-4 with stuck NFS handles === 2025-04-21 === * 09:52 taavi: update pywikibot-scripts-stable image to v10.0.0 [[phab:T385400|T385400]] === 2025-04-17 === * 16:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 16:45 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder === 2025-04-16 === * 19:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 19:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 19:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 19:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 14:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-04-15 === * 13:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-11 === * 21:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 21:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-10 === * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 15:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 === 2025-04-09 === * 21:35 bd808: Removed rook and sstefanova from https://gitlab.wikimedia.org/groups/toolforge-repos/ owners (both offboarded former WMCS staff) * 10:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-04-08 === * 15:17 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 15:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 02:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 02:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2025-04-07 === * 19:26 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 19:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:48 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:33 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-109 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:32 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-109 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:11 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:10 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:08 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:08 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-79 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-79 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-78 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:06 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-78 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-77 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-77 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-76 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-76 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-75 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-75 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-74 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 13:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-74 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-73 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-73 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-72 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:58 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-72 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-71 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:56 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-71 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-70 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:54 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-70 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:53 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-69 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:52 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:51 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-69 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-68 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:50 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-111 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-68 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-67 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-111 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-110 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:49 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:48 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:48 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-67 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-110 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-66 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:47 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-66 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-65 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:45 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-65 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:44 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:43 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:42 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-104 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:40 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:38 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:37 fnegri@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:30 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:22 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:22 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:15 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 12:07 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 11:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.28.14 to 1.29.15 ([[phab:T390214|T390214]]) * 08:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 08:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 07:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 07:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 05:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 05:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-04-06 === * 02:12 andrewbogott: truncating large logfiles on tools nfs === 2025-04-04 === * 10:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 09:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 09:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 09:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:21 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 09:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 09:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 08:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 08:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 08:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 07:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 07:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 07:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 07:03 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 07:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 02:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes === 2025-04-03 === * 22:26 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes * 22:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 22:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 22:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 22:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 22:22 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-33 * 22:17 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 22:16 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33 * 22:13 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-71 * 22:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 22:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-74 * 22:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-71 * 21:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-70, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-74 * 21:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 21:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 * 20:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 20:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 08:51 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 * 08:46 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 === 2025-04-02 === * 20:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-55 * 20:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68, tools-k8s-worker-nfs-55 * 12:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 * 12:37 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 === 2025-04-01 === * 14:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 13:59 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 * 13:56 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 13:55 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 13:54 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 * 13:49 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 === 2025-03-31 === * 12:48 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 12:42 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 12:03 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 * 11:58 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 * 09:04 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 08:59 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 === 2025-03-28 === * 16:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 16:40 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 13:58 taavi: reboot tools-static-15 due to stuck nginx worker processes * 10:10 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T389733|T389733]]) * 10:00 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T389733|T389733]]) * 09:42 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor ([[phab:T389733|T389733]]) * 09:30 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor ([[phab:T389733|T389733]]) === 2025-03-27 === * 17:34 root@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-33 * 17:26 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-40, tools-k8s-worker-nfs-33 * 17:26 root@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all NFS workers * 15:59 root@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:53 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all NFS workers * 15:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:02 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-111.tools.eqiad1.wikimedia.cloud to the cluster * 14:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:33 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 * 14:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17, tools-k8s-worker-nfs-21, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-72 === 2025-03-25 === * 15:32 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:18 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2 * 13:58 andrewbogott: rebooting tools-k8s-worker-nfs-2 * 13:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2 * 10:32 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 10:32 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx * 08:39 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-nginx * 08:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2025-03-24 === * 18:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 18:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 18:24 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 18:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 18:16 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-builder * 18:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 17:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 17:35 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 17:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:05 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 09:59 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 === 2025-03-22 === * 04:00 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 03:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 03:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 03:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 === 2025-03-20 === * 14:04 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.add_user_to_project (exit_code=0) for user 'chuckonwumelu' in role 'member' * 14:04 aborrero@cloudcumin1001: START - Cookbook wmcs.vps.add_user_to_project for user 'chuckonwumelu' in role 'member' === 2025-03-18 === * 15:23 arturo: hard-reboot tools-prometheus-6, not responding to ssh * 10:35 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 10:30 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 10:03 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T383238|T383238]]) * 09:57 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T383238|T383238]]) === 2025-03-17 === * 19:01 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 19:00 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 18:42 wmbot~dcaro@acme: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:41 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:37 wmbot~dcaro@acme: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:36 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:32 wmbot~dcaro@acme: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 18:32 wmbot~dcaro@acme: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 ([[phab:T383238|T383238]]) * 14:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T388965|T388965]]) * 14:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T388965|T388965]]) === 2025-03-16 === * 11:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 === 2025-03-15 === * 15:31 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-34, tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-16,tools-k8s-worker-nfs-34,tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 15:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16,tools-k8s-worker-nfs-34,tools-k8s-worker-nfs-77 ([[phab:T388965|T388965]]) * 12:55 dcaro: there was an NFS hiccup that made the NFS checks fail for a second and some workers get stuck for a bit [[phab:T388965|T388965]] === 2025-03-13 === * 22:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 18:14 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T362868|T362868]]) * 18:04 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T362868|T362868]]) * 18:00 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api ([[phab:T362868|T362868]]) * 17:50 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api ([[phab:T362868|T362868]]) * 17:40 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission ([[phab:T362868|T362868]]) * 17:29 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission ([[phab:T362868|T362868]]) * 17:27 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission ([[phab:T362868|T362868]]) * 17:17 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission ([[phab:T362868|T362868]]) * 17:14 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api ([[phab:T362868|T362868]]) * 17:05 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362868|T362868]]) * 16:46 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission ([[phab:T362868|T362868]]) * 16:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission ([[phab:T362868|T362868]]) * 16:25 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission ([[phab:T362868|T362868]]) * 16:14 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission ([[phab:T362868|T362868]]) * 10:17 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 10:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 === 2025-03-12 === * 17:56 dhinus: aptly repo remove bookworm-tools helmfile, removing custom version that is older than the one from apt.w.o * 03:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-03-11 === * 17:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 17:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:31 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-cli * 14:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 14:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:58 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 10:46 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission === 2025-03-10 === * 20:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 20:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 20:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 20:20 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 20:09 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 20:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 19:59 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:51 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:50 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 19:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 19:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 18:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 17:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 17:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder === 2025-03-07 === * 13:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 13:18 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2025-03-06 === * 13:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 12:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 12:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 12:15 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 12:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission === 2025-03-05 === * 19:16 dhinus: systemctl restart prometheus@tools on tools-prometheus-7 (the two prom hosts are returning different values) * 17:45 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362868|T362868]]) * 17:44 fnegri@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.2 ([[phab:T362868|T362868]]) * 17:44 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362868|T362868]]) * 16:06 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 16:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 09:13 dcaro: restarting ingress pods due to ingress timing out sometimes * 08:09 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 08:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2025-03-04 === * 20:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:47 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:28 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 15:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:01 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362868|T362868]]) * 14:01 fnegri@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.12.0 ([[phab:T362868|T362868]]) * 14:01 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362868|T362868]]) * 13:51 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:40 dhinus: reboot tools-legacy-redirector-2 (http probes failing more than usual) * 12:50 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 12:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 10:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 10:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 09:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 09:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 09:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 08:58 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-03-03 === * 17:04 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:55 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:18 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 16:09 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 13:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 13:10 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 13:01 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 11:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 11:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 09:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2025-03-01 === * 19:08 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 19:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 16:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 * 16:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 * 15:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 === 2025-02-27 === * 16:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 14:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder === 2025-02-26 === * 14:22 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:05 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-02-25 === * 19:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 === 2025-02-24 === * 21:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 21:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 21:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 20:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 20:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 20:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-02-21 === * 12:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 * 12:52 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 === 2025-02-20 === * 13:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer ([[phab:T320284|T320284]]) * 13:18 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer ([[phab:T320284|T320284]]) === 2025-02-19 === * 20:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55 * 20:25 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55 * 20:25 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 20:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 === 2025-02-18 === * 17:47 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 * 17:31 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-54 * 16:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 16:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 15:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 ([[phab:T380679|T380679]]) * 15:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103, tools-k8s-worker-108, tools-k8s-control-7 ([[phab:T380679|T380679]]) * 15:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 ([[phab:T380679|T380679]]) * 15:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 ([[phab:T380679|T380679]]) === 2025-02-17 === * 17:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 17:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2025-02-10 === * 12:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 12:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 12:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-02-09 === * 16:38 andrewbogott: rebooting tools-db-4 just in case that helps with the recurring DB crashes === 2025-02-07 === * 20:51 arturo: resize tools-legacy-redirector to have 2 vCPU [[phab:T385908|T385908]] * 17:58 andrewbogott: "SET GLOBAL read_only=OFF; " on tools-db-4; both -5 and -4 were set to read only. No idea why or how... * 01:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 01:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 * 01:28 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-07 * 01:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-07 * 01:27 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-07 * 01:27 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-07 === 2025-02-06 === * 17:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 17:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 15:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 15:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 14:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 14:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 14:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 14:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:06 andrewbogott: cold-migrating tools-proxy-8 for [[phab:T385264|T385264]]; will cause a brief toolforge outage * 14:05 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 14:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 14:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 13:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:15 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 13:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 13:06 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-admission * 13:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 12:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 12:37 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 12:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 12:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 12:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2025-02-03 === * 14:40 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-haproxy-5, tools-k8s-haproxy-6 * 14:40 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-haproxy-5, tools-k8s-haproxy-6 * 13:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-9, tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 * 13:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-9, tools-k8s-ingress-7, tools-k8s-ingress-8, tools-k8s-ingress-9 * 13:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 * 13:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 * 13:23 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-7 * 13:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 === 2025-02-01 === * 15:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-108 * 15:05 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-108 * 15:05 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-107 * 15:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-107 * 15:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-106 * 15:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-106 * 15:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-105 * 15:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-105 * 15:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-103 * 15:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-103 * 15:01 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-102 * 15:01 andrewbogott: rebooting all k8s (non-nfs) worker nodes for [[phab:T385264|T385264]] * 15:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-102 * 14:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 14:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 14:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 * 14:55 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 * 14:55 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-71 * 14:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-71 * 14:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-66 * 14:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-66 * 14:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 * 14:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 * 14:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 14:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 14:46 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 * 14:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 * 14:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-46 * 14:44 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-46 * 14:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 14:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 14:42 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 * 14:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 * 14:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-40 * 14:40 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-40 * 14:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-39 * 14:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-39 * 14:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-3 * 14:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-3 * 14:37 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32 * 14:36 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32 * 14:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 14:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 14:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 * 14:34 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 * 14:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 14:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14 * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14 * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 * 14:30 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 * 14:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 * 14:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 * 14:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-11 * 14:29 andrewbogott: rebooting all k8s-nfs worker nodes for [[phab:T385264|T385264]] * 14:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-11 * 14:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 * 14:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:21 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 * 14:20 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-10 * 14:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-10 === 2025-01-31 === * 11:04 dhinus: systemctl restart prometheus@tools on tools-prometheus-7 [[phab:T385262|T385262]] === 2025-01-29 === * 01:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2025-01-27 === * 16:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 15:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 15:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 13:52 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 13:52 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 13:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 13:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-01-26 === * 22:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 22:04 andrewbogott: restarting Node tools-k8s-worker-nfs-44 , too many D processes * 22:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 22:02 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-m8s-worker-nfs-44 * 22:02 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-m8s-worker-nfs-44 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud * 08:37 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:37 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-79.tools.eqiad1.wikimedia.cloud to the cluster * 08:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:26 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-78.tools.eqiad1.wikimedia.cloud to the cluster * 08:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:16 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-77.tools.eqiad1.wikimedia.cloud to the cluster * 08:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T384790|T384790]]) * 08:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 08:06 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-110.tools.eqiad1.wikimedia.cloud to the cluster * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster ([[phab:T384790|T384790]]) * 07:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 07:56 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-109.tools.eqiad1.wikimedia.cloud to the cluster * 07:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster ([[phab:T384790|T384790]]) * 07:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 * 07:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-14, tools-k8s-worker-nfs-55 === 2025-01-24 === * 10:39 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 * 10:34 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 === 2025-01-23 === * 14:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:39 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:10 dcaro: reboot tools-static-15 due to nginx stuck on nfs === 2025-01-22 === * 17:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 17:36 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 === 2025-01-18 === * 15:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 15:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 === 2025-01-17 === * 15:52 dhinus: reboot tools-legacy-redirector-2 (http probes were failing) === 2025-01-15 === * 04:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 04:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 03:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2025-01-13 === * 21:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 ([[phab:T383625|T383625]]) * 21:31 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 ([[phab:T383625|T383625]]) * 21:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 ([[phab:T383625|T383625]]) * 21:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-19 ([[phab:T383238|T383238]]) * 21:25 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 ([[phab:T383625|T383625]]) * 21:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-74 ([[phab:T383625|T383625]]) * 21:24 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-19 ([[phab:T383238|T383238]]) * 21:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-74 ([[phab:T383625|T383625]]) * 21:19 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 21:18 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 21:18 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-21 ([[phab:T383238|T383238]]) * 21:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T383625|T383625]]) * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T383625|T383625]]) * 21:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383238|T383238]]) * 21:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-2 ([[phab:T383238|T383238]]) * 21:14 andrew@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for tools-k8s-worker-nfs-75 ([[phab:T383238|T383238]]) * 21:13 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T383238|T383238]]) * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 ([[phab:T383625|T383625]]) * 21:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-2 ([[phab:T383238|T383238]]) * 21:08 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 21:05 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 ([[phab:T383625|T383625]]) * 21:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 21:03 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-13 ([[phab:T383238|T383238]]) * 20:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-13 ([[phab:T383238|T383238]]) * 20:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16 ([[phab:T383238|T383238]]) * 20:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 ([[phab:T383625|T383625]]) * 20:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16 ([[phab:T383238|T383238]]) * 20:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 20:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 ([[phab:T383625|T383625]]) * 20:49 dcaro: restart prometheus to pick up the new ips for vms and such * 20:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 20:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 20:47 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-8 * 20:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 ([[phab:T383625|T383625]]) * 20:43 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-20 ([[phab:T383625|T383625]]) * 20:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 ([[phab:T383625|T383625]]) * 20:42 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-20 ([[phab:T383238|T383238]]) * 20:42 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 ([[phab:T383238|T383238]]) * 20:42 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 20:41 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-8 * 20:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 20:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 20:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 20:36 lucaswerkmeister: restore root-owned /tmp/framer.txt on tools-sgebastion-10, tools-bastion-12, tools-bastion-13 (cf. 2025-01-05 log entry) following bastion reboots === 2025-01-12 === * 09:53 taavi: hard reboot tools-k8s-worker-nfs-55 === 2025-01-08 === * 18:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-43 ([[phab:T383238|T383238]]) * 18:34 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-43 ([[phab:T383238|T383238]]) * 18:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-32 ([[phab:T383238|T383238]]) * 18:26 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-32 ([[phab:T383238|T383238]]) * 18:19 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 ([[phab:T383238|T383238]]) * 18:14 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 ([[phab:T383238|T383238]]) * 18:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 18:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-1 ([[phab:T383238|T383238]]) * 18:12 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-47 ([[phab:T383238|T383238]]) * 18:06 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-47 ([[phab:T383238|T383238]]) * 18:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-41 ([[phab:T383238|T383238]]) * 18:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-41 ([[phab:T383238|T383238]]) * 18:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-8 ([[phab:T383238|T383238]]) * 17:59 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-8 ([[phab:T383238|T383238]]) * 17:59 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-27 ([[phab:T383238|T383238]]) * 17:53 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-27 ([[phab:T383238|T383238]]) * 17:53 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-67 ([[phab:T383238|T383238]]) * 17:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-67 ([[phab:T383238|T383238]]) * 17:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 ([[phab:T383238|T383238]]) * 17:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 ([[phab:T383238|T383238]]) * 17:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-26 ([[phab:T383238|T383238]]) * 17:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-26 ([[phab:T383238|T383238]]) * 17:34 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-76 ([[phab:T383238|T383238]]) * 17:28 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-76 ([[phab:T383238|T383238]]) * 17:27 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 ([[phab:T383238|T383238]]) * 17:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 ([[phab:T383238|T383238]]) * 17:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-12 ([[phab:T383238|T383238]]) * 17:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-12 ([[phab:T383238|T383238]]) * 17:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-48 ([[phab:T383238|T383238]]) * 17:01 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-48 ([[phab:T383238|T383238]]) * 16:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 16:52 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-57 ([[phab:T383238|T383238]]) * 16:51 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-65 ([[phab:T383238|T383238]]) * 16:45 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-65 ([[phab:T383238|T383238]]) * 16:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 ([[phab:T383238|T383238]]) * 16:33 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 ([[phab:T383238|T383238]]) * 16:25 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 16:20 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-35 ([[phab:T383238|T383238]]) * 16:00 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 15:55 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-58 ([[phab:T383238|T383238]]) * 15:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-36 * 15:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-36 * 15:40 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38 * 15:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-38 * 15:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-42 * 15:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-42 * 15:29 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22 * 15:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22 * 15:09 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 15:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 14:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 * 14:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 * 14:25 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-70 * 14:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-70 * 14:16 dcaro: reboot tools-static-15 nfs is stuck === 2025-01-07 === * 00:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 00:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 00:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:09 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 00:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 00:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor === 2025-01-06 === * 23:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 23:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 23:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-harbor * 23:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 23:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 23:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 23:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 23:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 23:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 23:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 23:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 16:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor === 2025-01-05 === * 18:58 lucaswerkmeister: remove /tmp/framer.txt on tools-bastion-13 (I notified the owner privately), and replace it with a root-owned file to prevent iTerm from leaking logs into it (https://iterm2.com/downloads/stable/iTerm2-3_5_11.changelog) on tools-sgebastion-10, tools-bastion-12 and tools-bastion-13 === 2025-01-03 === * 21:46 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-69 * 21:41 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-69 * 21:40 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-69 * 21:35 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-69 === 2025-01-02 === * 02:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 * 02:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 === 2025-01-01 === * 21:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 21:05 andrewbogott: truncating *.err and *.out files to clear out NFS space * 21:04 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 * 21:04 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-34 * 20:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-34 === 2024-12-13 === * 14:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 14:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 14:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 14:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld * 09:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-68 * 09:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-68 * 09:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-44 * 09:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-44 * 08:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-73 * 08:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-73 === 2024-12-12 === * 10:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-5 * 10:47 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-5 === 2024-12-06 === * 17:26 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-1 ([[phab:T352206|T352206]]) * 17:25 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-1 ([[phab:T352206|T352206]]) * 17:24 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-3 ([[phab:T352206|T352206]]) * 17:23 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-3 ([[phab:T352206|T352206]]) * 07:56 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 07:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-12-05 === * 16:34 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:42 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 14:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:06 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 13:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-12-04 === * 19:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 19:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 19:26 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 19:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 17:46 andrewbogott: rebooting tools-legacy-redirector-2, many probes failing * 17:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 17:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 17:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 17:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:54 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 16:47 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 16:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 16:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:45 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 15:45 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:26 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 15:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 15:18 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:11 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 15:11 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-api * 15:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 15:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 15:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 14:46 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 14:45 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 01:31 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:18 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:17 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:17 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:15 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:14 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 01:12 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component components-api * 01:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-12-03 === * 22:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 22:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 22:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-harbor * 21:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-harbor * 21:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component main * 21:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component main === 2024-11-29 === * 03:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-27 === * 18:26 taavi: kubectl sudo rollout restart -n kube-system deployment coredns # update resolv.conf in coredns containers === 2024-11-26 === * 10:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-7 * 10:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:36 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:34 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:32 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:31 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-control-7 * 10:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-7 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-9 * 10:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-9 * 10:22 dcaro: rebooting k8s-control-9 * 10:18 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-control-8 * 10:17 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-control-8 * 10:17 dcaro: rebooting k8s-control-8 * 09:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-72 * 09:14 dcaro: restarting tools-k8s-worker-nfs-72 * 09:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-72 * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 * 09:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 09:12 dcaro: restarting tools-k8s-worker-nfs-70 * 09:11 dcaro: restarting tools-k8s-worker-nfs-50 * 09:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 09:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 08:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-61 * 08:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-61 * 07:30 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers ([[phab:T380827|T380827]]) * 06:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers ([[phab:T380827|T380827]]) === 2024-11-25 === * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 12:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli === 2024-11-23 === * 07:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder ([[phab:T358225|T358225]]) * 07:21 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder ([[phab:T358225|T358225]]) === 2024-11-20 === * 15:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 14:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 12:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 12:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 00:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission ([[phab:T362867|T362867]]) * 00:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission ([[phab:T362867|T362867]]) === 2024-11-19 === * 21:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 21:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 21:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 21:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 21:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 21:05 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 20:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 20:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-emailer * 20:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 20:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 20:38 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 20:31 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component envvars-api ([[phab:T362867|T362867]]) * 20:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:30 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api ([[phab:T362867|T362867]]) * 20:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api ([[phab:T362867|T362867]]) * 20:17 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T362867|T362867]]) * 20:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T362867|T362867]]) * 20:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 20:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T362867|T362867]]) * 19:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission ([[phab:T362867|T362867]]) * 19:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission ([[phab:T362867|T362867]]) * 19:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission ([[phab:T362867|T362867]]) * 19:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission ([[phab:T362867|T362867]]) * 15:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 15:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-11-18 === * 14:45 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 14:39 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 14:35 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 14:33 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 11:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 11:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer === 2024-11-15 === * 14:05 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:04 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-5.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:03 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T352206|T352206]]) * 13:57 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T352206|T352206]]) * 13:50 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 13:49 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) === 2024-11-14 === * 13:16 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 13:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:04 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 13:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-12 === * 15:50 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 10:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice === 2024-11-11 === * 16:02 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 15:58 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:44 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:42 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-db-4.tools.eqiad1.wikimedia.cloud ([[phab:T352206|T352206]]) * 14:41 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:37 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-db' ([[phab:T352206|T352206]]) * 14:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-11-10 === * 02:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.11.0 ([[phab:T362867|T362867]]) * 02:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T362867|T362867]]) === 2024-11-06 === * 16:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 16:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 15:48 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 10:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 ([[phab:T379139|T379139]]) * 07:57 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component tools-webservice * 07:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 07:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-11-05 === * 17:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 17:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics * 08:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 08:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 07:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 07:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico === 2024-11-04 === * 16:39 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:30 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:22 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 16:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 15:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 14:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 14:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:45 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-76 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-75 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-74 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-73 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-72 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-71 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-70 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-69 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-68 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-67 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-66 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-65 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:25 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:24 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 14:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:56 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:55 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:53 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:52 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:51 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:50 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:48 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:44 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:43 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:37 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:31 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:27 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:25 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:20 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:13 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:12 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:11 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 13:04 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 12:55 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:41 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:40 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:39 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:37 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:36 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 12:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:13 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 12:11 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 12:03 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:59 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component envvars-api * 11:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 11:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:49 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:42 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:26 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.27.16 to 1.28.14 ([[phab:T362867|T362867]]) * 11:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 10:56 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:42 dcaro: added api.svc.toolforge.org dns record entry * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 10:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:56 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:55 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:51 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 09:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-22 === * 13:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 * 12:58 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-33, tools-k8s-woker-nfs-23 * 09:05 arturo: restart puppetserver service for [[phab:T377803|T377803]] === 2024-10-16 === * 09:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-15 === * 17:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 17:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 16:16 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 16:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-10-14 === * 09:14 dcaro: migrating pipelineruns stored versions to v1 ([[phab:T376710|T376710]]) * 07:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 07:24 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 * 07:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-10-09 === * 09:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 09:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-08 === * 13:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld ([[phab:T376710|T376710]]) * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld ([[phab:T376710|T376710]]) * 12:38 dcaro: tests are passing correctly, upgrade finished, will investigate the increased slowness as a followup * 12:27 dcaro: upgrade finished, build actions have become slower than usual ([[phab:T376710|T376710]]), running tests and investigating * 12:02 dcaro: starting toolforge builds-builder upgrade, no downtime expected though some builds might fail to start/list/log/show while the upgrade is in progress [[phab:T374908|T374908]] * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 08:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-04 === * 11:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 11:51 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 11:44 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api === 2024-10-02 === * 09:11 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component maintain-kubeusers * 09:07 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-10-01 === * 10:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 10:46 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 10:32 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 10:28 dcaro: updated ci image with latest precommit versions * 10:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:52 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component ingress-admission * 09:47 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-30 === * 18:25 taavi: run striker migrations [[phab:T359428|T359428]] === 2024-09-28 === * 00:14 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 00:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli === 2024-09-27 === * 23:58 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 23:52 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-26 === * 16:45 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 16:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 16:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:18 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component registry-admission * 16:08 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 16:05 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 15:58 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 10:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli * 10:20 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli * 10:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-cli * 10:05 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component envvars-cli * 07:53 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component toolforge-weld * 07:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component toolforge-weld === 2024-09-25 === * 08:00 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-7 * 07:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-7 === 2024-09-24 === * 22:11 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers ([[phab:T375157|T375157]]) * 22:03 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers ([[phab:T375157|T375157]]) * 21:48 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno ([[phab:T359641|T359641]]) * 21:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component kyverno ([[phab:T359641|T359641]]) === 2024-09-20 === * 20:12 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 20:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 20:06 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 19:36 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component calico ([[phab:T341066|T341066]]) * 19:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico ([[phab:T341066|T341066]]) * 17:06 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:06 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/pod2daemon-flexvol:v3.28.2 ([[phab:T359641|T359641]]) * 17:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/typha:v3.28.2 ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/node:v3.28.2 ([[phab:T359641|T359641]]) * 17:03 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/kube-controllers:v3.28.2 ([[phab:T359641|T359641]]) * 17:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/ctl:v3.28.2 ([[phab:T359641|T359641]]) * 16:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:57 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:56 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/calico/cni:v3.28.2 ([[phab:T359641|T359641]]) * 16:54 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 06:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 00:39 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) * 00:32 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component wmcs-k8s-metrics ([[phab:T359641|T359641]]) === 2024-09-19 === * 23:17 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.10 ([[phab:T359641|T359641]]) * 23:17 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 23:12 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.10.1 ([[phab:T359641|T359641]]) * 23:11 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:38 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:37 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=97) ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/docker-registry.tools.wmflabs.org/metrics-server:v0.7.1 ([[phab:T359641|T359641]]) * 22:35 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 17:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-cli ([[phab:T341066|T341066]]) * 17:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-cli ([[phab:T341066|T341066]]) * 17:13 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api ([[phab:T341066|T341066]]) * 17:06 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:48 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 16:46 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:45 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api * 16:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 16:38 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 16:10 dcaro: rebooting tools-k8s-worker-nfs-24 it's stuck without network * 16:08 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:08 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:07 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 16:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:28 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:19 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:18 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:08 wmbot~raymondndibe@wmf3402: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component jobs-api ([[phab:T341066|T341066]]) * 15:07 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 15:01 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:57 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) * 14:56 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api ([[phab:T341066|T341066]]) * 14:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api ([[phab:T341066|T341066]]) === 2024-09-17 === * 08:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-70 ([[phab:T359641|T359641]]) * 08:43 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-75 ([[phab:T359641|T359641]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 08:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud ([[phab:T359641|T359641]]) * 03:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:20 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:19 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:18 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 03:13 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-64 * 03:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-63 * 03:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 03:07 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:07 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-76.tools.eqiad1.wikimedia.cloud to the cluster * 03:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 03:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 03:00 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-75.tools.eqiad1.wikimedia.cloud to the cluster * 02:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:46 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-74.tools.eqiad1.wikimedia.cloud to the cluster * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-62 * 02:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-60 * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 02:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 02:38 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:38 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-73.tools.eqiad1.wikimedia.cloud to the cluster * 02:36 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-72.tools.eqiad1.wikimedia.cloud to the cluster * 02:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:24 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:24 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-71.tools.eqiad1.wikimedia.cloud to the cluster * 02:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:12 raymond-ndibe@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-6 * 02:10 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-56 * 02:08 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 02:08 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-70.tools.eqiad1.wikimedia.cloud to the cluster * 02:05 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 02:04 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-49 * 02:02 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-31 * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:57 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-69.tools.eqiad1.wikimedia.cloud to the cluster * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:57 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:56 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-30 * 01:54 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-29 * 01:50 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 01:49 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-64 ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:48 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:46 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:45 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-nfs-28 * 01:42 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:42 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-68.tools.eqiad1.wikimedia.cloud to the cluster * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:40 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 01:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-62 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:32 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-67.tools.eqiad1.wikimedia.cloud to the cluster * 01:29 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:29 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:28 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:23 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 01:23 raymond-ndibe@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-66.tools.eqiad1.wikimedia.cloud to the cluster * 01:23 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:22 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:21 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:16 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:15 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:14 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-49 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:08 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:02 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:01 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 01:00 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-31 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:59 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:58 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-30 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:53 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:52 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:47 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:41 raymond-ndibe@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:35 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:34 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:33 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:32 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:31 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.26.15 to 1.27.16 ([[phab:T359641|T359641]]) * 00:30 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:26 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:10 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50, tools-k8s-worker-nfs-56, tools-k8s-worker-nfs-57, tools-k8s-worker-nfs-6 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-38, tools-k8s-worker-nfs-46, tools-k8s-worker-nfs-49, tools-k8s-worker-nfs-50 ([[phab:T359641|T359641]]) * 00:09 raymond-ndibe@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-60, tools-k8s-worker-nfs-61, tools-k8s-worker-nfs-62, tools-k8s-worker-nfs-63 ([[phab:T359641|T359641]]) * 00:04 raymond-ndibe@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-31, tools-k8s-worker-nfs-32, tools-k8s-worker-nfs-33, tools-k8s-worker-nfs-36 ([[phab:T359641|T359641]]) === 2024-09-16 === * 17:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-45 * 17:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-45 * 17:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 17:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-09-13 === * 11:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 ([[phab:T374692|T374692]]) * 09:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) * 09:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 ([[phab:T374692|T374692]]) === 2024-09-12 === * 12:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:54 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23, tools-k8s-worker-16, tools-k8s-worker-nfs-33 ([[phab:T374612|T374612]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) * 11:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-28 ([[phab:T374612|T374612]]) === 2024-09-11 === * 10:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 10:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers === 2024-09-09 === * 16:23 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component cert-manager * 16:16 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component cert-manager === 2024-09-06 === * 08:47 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 08:42 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 08:38 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component jobs-api * 08:36 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 07:14 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/pause:3.6 * 07:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-09-05 === * 13:50 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/stakater-reloader:v1.1.0 ([[phab:T359641|T359641]]) * 13:50 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:46 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:45 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:41 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: END (FAIL) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=99) ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/startupapicheck:v1.15.3 ([[phab:T359641|T359641]]) * 13:40 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:28 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/cainjector:v1.15.3 ([[phab:T359641|T359641]]) * 13:27 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/webhook:v1.15.3 ([[phab:T359641|T359641]]) * 13:26 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) * 13:24 wmbot~raymondndibe@wmf3402: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: Updating container image docker-registry.tools.wmflabs.org/cert-manager/controller:v1.15.3 ([[phab:T359641|T359641]]) * 13:23 wmbot~raymondndibe@wmf3402: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry ([[phab:T359641|T359641]]) === 2024-09-04 === * 14:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 14:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 14:02 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component maintain-kubeusers * 13:56 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component maintain-kubeusers * 13:41 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:37 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 13:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 13:02 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 13:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission === 2024-09-03 === * 20:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 19:53 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-emailer * 19:48 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-emailer * 19:36 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 19:29 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 15:46 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component kyverno * 15:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 15:29 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component kyverno * 15:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component kyverno * 14:41 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-admission * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-admission * 14:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:05 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno-cli:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 14:04 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 13:56 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) ([[phab:T359641|T359641]]) * 13:55 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.28.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:54 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.12.5 ([[phab:T359641|T359641]]) * 13:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry ([[phab:T359641|T359641]]) * 13:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 11:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 10:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 09:51 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.25.16 to 1.26.15 * 05:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.25.16 to 1.26.15 * 05:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 * 05:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.25.16 to 1.26.15 === 2024-09-02 === * 14:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-108 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-64 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 14:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.25.16 to 1.26.15 * 13:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.25.16 to 1.26.15 * 13:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:30 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-63 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.25.16 to 1.26.15 * 13:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-62 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.25.16 to 1.26.15 * 13:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-61 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:27 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-60 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:25 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-58 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.25.16 to 1.26.15 * 13:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-57 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.25.16 to 1.26.15 * 13:22 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.25.16 to 1.26.15 * 13:20 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:20 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:17 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.25.16 to 1.26.15 * 13:14 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.25.16 to 1.26.15 * 13:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.25.16 to 1.26.15 * 13:11 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:11 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.25.16 to 1.26.15 * 13:08 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.25.16 to 1.26.15 * 13:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.25.16 to 1.26.15 * 13:05 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:05 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.25.16 to 1.26.15 * 13:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:04 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:02 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:02 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.25.16 to 1.26.15 * 13:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:01 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 13:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 13:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.25.16 to 1.26.15 * 12:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.25.16 to 1.26.15 * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:56 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.25.16 to 1.26.15 * 12:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.25.16 to 1.26.15 * 12:54 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.25.16 to 1.26.15 ([[phab:T370249|T370249]]) * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.25.16 to 1.26.15 * 12:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.25.16 to 1.26.15 * 12:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.25.16 to 1.26.15 * 12:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.25.16 to 1.26.15 * 12:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.25.16 to 1.26.15 * 12:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.25.16 to 1.26.15 * 12:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.25.16 to 1.26.15 * 12:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.25.16 to 1.26.15 * 12:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.25.16 to 1.26.15 * 12:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.25.16 to 1.26.15 * 11:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:48 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.25.16 to 1.26.15 * 11:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.25.16 to 1.26.15 * 11:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.25.16 to 1.26.15 * 10:05 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 09:58 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 09:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component registry-admission * 09:43 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component registry-admission * 09:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 09:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 08:48 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.component.deploy (exit_code=97) for component components-api * 08:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-29 === * 16:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 08:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-nginx * 07:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component ingress-nginx === 2024-08-27 === * 12:06 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:06 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.11.2 * 12:06 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 09:46 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 wmbot~dcaro@urcuchillay: Added a new k8s worker tools-k8s-worker-108.tools.eqiad1.wikimedia.cloud to the cluster * 09:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component calico * 08:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component calico * 08:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 08:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 08:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 08:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:37 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-52 ([[phab:T373243|T373243]]) * 08:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-51 ([[phab:T373243|T373243]]) * 08:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-25 ([[phab:T373243|T373243]]) * 08:31 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-18 ([[phab:T373243|T373243]]) * 08:29 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-15 ([[phab:T373243|T373243]]) * 08:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 08:19 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 08:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-08-26 === * 21:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 21:13 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-64.tools.eqiad1.wikimedia.cloud to the cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 21:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 20:23 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-63.tools.eqiad1.wikimedia.cloud to the cluster * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 20:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 20:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 18:35 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:49 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-62.tools.eqiad1.wikimedia.cloud to the cluster * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 17:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.quota_increase * 17:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 17:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 17:04 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-61.tools.eqiad1.wikimedia.cloud to the cluster * 16:54 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:54 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-60.tools.eqiad1.wikimedia.cloud to the cluster * 16:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:30 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 16:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:14 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-58.tools.eqiad1.wikimedia.cloud to the cluster * 16:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 16:02 wmbot~dcaro@urcuchillay: Added a new k8s worker-nfs tools-k8s-worker-nfs-57.tools.eqiad1.wikimedia.cloud to the cluster * 15:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:49 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:44 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:39 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:38 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=97) for a worker-nfs role in the tools cluster * 15:35 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:33 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:15 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 15:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 ([[phab:T373243|T373243]]) * 13:12 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 13:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-4, tools-k8s-worker-nfs-15, tools-k8s-worker-nfs-18, tools-k8s-worker-nfs-25, tools-k8s-worker-nfs-51, tools-k8s-worker-nfs-52, tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:44 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 12:42 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-104 ([[phab:T373243|T373243]]) * 11:06 dcaro: manually deleted the coredns pods that had been around for 4d * 09:08 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:02 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 09:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 09:00 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component api-gateway * 08:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 08:18 dcaro: scale up cordens deployment to 4 replicas === 2024-08-21 === * 05:44 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 05:38 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component components-api * 05:27 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-builder * 05:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-builder * 05:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 04:55 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 04:43 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component volume-admission * 04:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:28 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:25 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:22 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:21 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:20 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component volume-admission * 04:20 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component volume-admission * 04:10 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component ingress-admission * 04:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component ingress-admission * 03:49 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 03:42 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 03:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 03:28 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:19 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.component.deploy (exit_code=99) for component builds-api * 03:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 03:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.component.deploy for component builds-api === 2024-08-19 === * 22:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-24 * 21:56 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-24 * 21:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17 * 21:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 * 21:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-17,tools-k8s-worker-nfs-24 === 2024-08-15 === * 06:30 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-20 * 06:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-20 === 2024-08-13 === * 09:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 09:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 07:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-6 * 07:33 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-6 === 2024-08-12 === * 15:33 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 15:27 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway * 12:31 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 11:51 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component tools-webservice * 11:46 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component tools-webservice * 10:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component jobs-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component jobs-api * 09:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component api-gateway * 09:50 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.component.deploy for component api-gateway === 2024-08-08 === * 16:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component envvars-api * 16:51 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component envvars-api * 16:36 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component builds-api * 16:30 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component builds-api * 16:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.component.deploy (exit_code=0) for component components-api * 16:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.component.deploy for component components-api === 2024-08-06 === * 09:50 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=1) * 09:50 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:50 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:20 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:19 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 09:19 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console === 2024-08-05 === * 13:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 13:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 11:42 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:18 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:18 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 08:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-08-01 === * 20:42 bd808: Uncordoned tools-k8s-worker-nfs-55 following reboot * 20:40 bd808: Hard reboot of tools-k8s-worker-nfs-55 following drain cookbook run. Stuck pod remained stuck as expected. * 20:37 bd808@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-55 * 20:32 bd808: Draining and rebooting tools-k8s-worker-nfs-55 after reports of stuck pods via irc * 20:32 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 15:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 15:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api === 2024-07-31 === * 20:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 20:36 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 20:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 16:17 andrewbogott: changing login.tools.wmlabs.org to point to a newer bastion, tools-bastion-12, in response to [[phab:T371505|T371505]] * 11:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 11:33 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component components-api * 11:33 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component components-api * 10:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 * 09:49 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-22, tools-k8s-worker-nfs-26, tools-k8s-worker-nfs-43 === 2024-07-30 === * 18:08 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:06 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:06 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 18:05 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 18:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 18:02 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-cli * 18:01 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:59 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:40 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component jobs-cli * 17:39 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 17:37 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 17:36 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-23 * 16:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-23 === 2024-07-29 === * 18:24 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:23 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 18:06 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:05 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 16:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 14:05 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0) * 14:03 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance * 13:19 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-cli * 13:18 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-cli * 12:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-cli * 12:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli * 12:01 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-cli * 12:00 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-cli === 2024-07-25 === * 15:19 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 15:19 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 08:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics === 2024-07-24 === * 09:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 09:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 08:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 08:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 07:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component ingress-admission * 06:57 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission === 2024-07-23 === * 15:04 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 15:04 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 13:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 12:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 12:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:00 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-22 === * 17:42 dcaro: moved the apt repo to service endpoint deb.svc.toolforge.org * 17:39 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-3 * 17:38 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-3 * 17:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 17:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 17:00 dcaro: moving the toolforge apt repo to tools-services-06 * 16:55 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-services-06.tools.eqiad1.wikimedia.cloud * 16:53 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-services-06.tools.eqiad1.wikimedia.cloud * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:58 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-19 === * 12:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:46 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.9.2 * 12:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 10:03 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 10:02 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/nginx-ingress-controller:v1.9.6 * 10:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry === 2024-07-18 === * 14:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 14:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 08:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 08:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-17 === * 14:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:12 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:13 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:13 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:07 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 08:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-07-16 === * 15:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.24.17 to 1.25.16 * 14:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.24.17 to 1.25.16 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.24.17 to 1.25.16 * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.24.17 to 1.25.16 * 11:30 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.24.17 to 1.25.16 * 11:28 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:27 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.24.17 to 1.25.16 * 11:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.24.17 to 1.25.16 * 11:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 11:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.24.17 to 1.25.16 * 11:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 11:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.24.17 to 1.25.16 * 11:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.24.17 to 1.25.16 * 11:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-107 from 1.24.17 to 1.25.16 * 11:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-106 from 1.24.17 to 1.25.16 * 11:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-105 from 1.24.17 to 1.25.16 * 11:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-nfs-worker-21 from 1.24.17 to 1.25.16 * 11:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 11:02 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 10:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.24.17 to 1.25.16 * 10:57 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:57 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:56 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.24.17 to 1.25.16 * 10:55 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:54 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.24.17 to 1.25.16 * 10:53 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:52 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.24.17 to 1.25.16 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.24.17 to 1.25.16 * 10:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.24.17 to 1.25.16 * 10:50 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.24.17 to 1.25.16 * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.24.17 to 1.25.16 * 10:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.24.17 to 1.25.16 * 10:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.24.17 to 1.25.16 * 10:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.24.17 to 1.25.16 * 10:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:45 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.24.17 to 1.25.16 * 10:44 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:44 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.24.17 to 1.25.16 * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.24.17 to 1.25.16 * 10:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.24.17 to 1.25.16 * 10:41 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.24.17 to 1.25.16 * 10:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.24.17 to 1.25.16 * 10:40 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.24.17 to 1.25.16 * 10:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.24.17 to 1.25.16 * 10:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:39 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.24.17 to 1.25.16 * 10:38 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:38 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.24.17 to 1.25.16 * 10:37 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.24.17 to 1.25.16 * 10:37 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.24.17 to 1.25.16 * 10:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.24.17 to 1.25.16 * 10:34 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.24.17 to 1.25.16 * 10:32 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.24.17 to 1.25.16 * 10:31 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.24.17 to 1.25.16 * 10:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:28 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.24.17 to 1.25.16 * 10:27 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.24.17 to 1.25.16 * 10:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:25 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.24.17 to 1.25.16 * 10:24 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.24.17 to 1.25.16 * 10:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:22 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.24.17 to 1.25.16 * 10:21 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:20 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.24.17 to 1.25.16 * 10:19 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:18 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.24.17 to 1.25.16 * 10:17 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.24.17 to 1.25.16 * 10:16 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:15 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 10:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.24.17 to 1.25.16 * 10:12 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.24.17 to 1.25.16 * 10:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.24.17 to 1.25.16 * 10:11 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=97) for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.24.17 to 1.25.16 * 10:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.24.17 to 1.25.16 * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.24.17 to 1.25.16 * 10:08 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 10:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.24.17 to 1.25.16 * 09:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-1 from 1.24.17 to 1.25.16 * 09:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.24.17 to 1.25.16 * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.24.17 to 1.25.16 * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.24.17 to 1.25.16 * 09:07 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.24.17 to 1.25.16 * 09:06 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.24.17 to 1.25.16 * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-15 === * 14:42 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:42 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:40 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:40 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-11 === * 17:49 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:49 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:49 dcaro: deploy toolforge-jobs-framework 16.0.13 ([[phab:T369573|T369573]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 11:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission === 2024-07-10 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 16:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 16:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 15:16 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:10 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:10 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-07-09 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 14:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-07-08 === * 20:22 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-37 * 20:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-37 * 14:09 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:08 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-3 * 13:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-2 * 13:56 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-elastic-1 * 13:56 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-elastic-1 * 13:36 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:36 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 13:20 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 13:20 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 12:49 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 12:49 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:59 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:46 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:46 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-07-05 === * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 12:29 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:27 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:27 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 12:26 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 12:26 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 12:23 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) * 12:23 sstefanova@cloudcumin1001: Updating container image docker-registry.tools.wmflabs.org/kube-state-metrics:v2.7.0 * 12:23 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry * 11:29 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.image.copy_to_registry (exit_code=0) copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 11:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.image.copy_to_registry copy image from bitnami/kubectl:1.26.4 to docker-registry.tools.wmflabs.org/bitnami-kubectl:1.26.4 * 01:47 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 01:46 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-07-04 === * 17:09 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:09 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 12:57 arturo: updating kubelet flags [[phab:T355881|T355881]] * 12:00 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:00 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 11:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:43 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:34 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:34 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:54 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 07:53 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-07-03 === * 12:25 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:25 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 10:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-07-02 === * 17:16 andrewbogott: draining (I hope) tools-elastic-3 and tools-elastic-1 for [[phab:T311905|T311905]] * 17:07 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 17:07 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 16:55 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 16:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:01 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:01 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:53 arturo: cleanup kubeadm configmap from TTLAfterFinished settings ([[phab:T349197|T349197]]) * 11:51 arturo: remove --feature-gates=TTLAfterFinished=true from kube-controller-manager static pod definition ([[phab:T349197|T349197]]) * 10:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 09:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 09:10 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 09:10 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-07-01 === * 15:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 14:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 14:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:21 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 13:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 13:06 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 13:06 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission === 2024-06-28 === * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 09:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 09:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:38 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:37 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway === 2024-06-27 === * 16:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-23 * 16:44 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-23 * 16:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-1 * 16:21 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-db-1 * 15:49 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-1 * 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-db-3 * 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-db-3 * 15:40 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-24 * 15:37 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-24 * 15:36 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-etcd-22 * 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-etcd-22 * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component cert-manager * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component cert-manager * 14:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 14:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx * 11:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:02 arturo: drop all PSP definitions for all accounts ([[phab:T368142|T368142]]) * 10:02 arturo: disabled PodSecurityPolicy admission plugin from kubeadm configmap ([[phab:T368142|T368142]]) * 09:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-26 === * 11:40 taavi: update pywikibot image to 9.2 [[phab:T363631|T363631]] * 10:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:18 arturo: deploying toolforge-webservice 0.103.9 ([[phab:T368463|T368463]]) * 09:18 arturo: setting kyverno policies to Enforce ([[phab:T368141|T368141]]) * 09:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-29 * 08:01 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-29 === 2024-06-25 === * 21:50 bd808: Live hacked /usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py on login-buster.toolforge.org to remove the `-> dict[str, Any]` type annotations causing [[phab:T368463|T368463]] * 12:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-104 * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-104 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-103 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-104 * 12:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-103 * 12:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-102 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-103 * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-56 * 12:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 12:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-56 * 12:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-55 * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-55 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-54 * 12:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-56 * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-54 * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-53 * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-55 * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-53 * 12:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-52 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-54 * 12:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-52 * 12:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:14 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-51 * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-53 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-51 * 12:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-53 * 11:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-52 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-50 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-52 * 11:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-50 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 11:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-50 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-50 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-50 * 11:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-7 * 11:10 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-7 * 11:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.migrate_floating_ip (exit_code=0) for address 185.15.56.11 to server 'tools-proxy-8' * 11:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.migrate_floating_ip for address 185.15.56.11 to server 'tools-proxy-8' * 09:44 arturo: deploy toolforge-webservice 0.103.8 ([[phab:T362050|T362050]]) * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-6 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-9 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-9 * 09:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-9 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-9 * 08:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-49 * 08:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-49 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-48 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-49 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-47 * 08:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-48 * 08:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-47 * 08:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-46 * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-45 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-47 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-47 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-45 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-44 * 08:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-46 * 08:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-46 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-44 * 08:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-45 * 08:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-k8s-worker-nfs-43 * 08:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-42 * 08:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-44 * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-44 * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-42 * 08:13 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:07 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-42 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-41 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-42 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-41 * 08:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-40 * 07:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-39 * 07:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-41 * 07:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-39 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-38 * 07:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-40 * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-38 * 07:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-37 * 07:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-39 * 07:55 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-37 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-36 * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 07:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-36 * 07:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-35 * 07:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-37 * 07:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-35 * 07:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-34 * 07:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-36 * 07:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-34 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-35 * 07:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-33 * 07:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-35 * 07:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-34 * 07:31 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-33 * 07:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-33 * 07:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-33 === 2024-06-24 === * 20:56 andrewbogott: rebooting tools-k8s-worker-nfs-36; it has lots of stuck processes which somehow didn't get unstuck when we did the post-nfs-migration reboots. * 15:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-32 * 15:53 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-32 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-31 * 15:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-32 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-31 * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-32 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-30 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-31 * 15:48 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-30 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-29 * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-30 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-29 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-28 * 15:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-29 * 15:45 arturo: deploy toolforge-webservice 0.103.7 ([[phab:T362050|T362050]]) * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-29 * 15:44 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-28 * 15:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-28 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-27 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-28 * 15:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-27 * 15:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-27 * 15:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-sgebastion-10 * 14:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-sgebastion-10 * 14:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-13 * 14:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-13 * 14:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-bastion-12 * 14:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 14:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-nfs-2 * 14:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=99) for server tools-nfs-2 * 13:57 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-nfs-2 * 13:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs (exit_code=0) for server tbd * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_dbinstance_to_ovs for server tbd * 13:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-26 * 13:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-25 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-26 * 13:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-24 * 13:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-26 * 13:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-24 * 13:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-24 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-23 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-24 * 13:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-22 * 13:29 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-22 * 13:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-21 * 13:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-23 * 13:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-21 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-20 * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-22 * 13:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-20 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-21 * 13:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-19 * 13:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-21 * 13:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-19 * 13:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-18 * 13:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-20 * 13:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-17 * 13:18 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-20 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-19 * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-18 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-17 * 13:15 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-17 * 13:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-16 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-16 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-15 * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-16 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-15 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-14 * 12:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-15 * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-14 * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-13 * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-14 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-13 * 12:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-12 * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-13 * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-12 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-12 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-11 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-12 * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-7 * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-11 * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-7 * 12:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-8 * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-8 * 12:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-8 * 12:13 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-8 * 12:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-static-15 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-static-15 * 12:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-acme-chief-4 * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-acme-chief-4 * 12:00 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=97) for node tools-k8s-worker-nfs-10 * 11:58 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-10 * 11:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:56 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-10 * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-10 * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 11:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-9 * 11:42 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-9 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-8 * 11:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-8 * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-8 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-7 * 11:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-8 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-7 * 11:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-7 * 11:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-6 * 11:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-5 * 11:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-4 * 11:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-6 * 11:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-4 * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-5 * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-4 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-4 * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-3 * 11:25 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-3 * 11:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-2 * 11:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-1 * 11:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-3 * 11:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-2 * 11:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-2 * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-1 * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 10:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-5 * 10:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-5 * 10:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-7 * 10:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-7 * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:13 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-ingress-7 * 10:11 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-worker-nfs-43 * 10:11 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-ingress-7 * 10:09 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-worker-nfs-43 * 10:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-control-7 * 10:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-control-7 * 10:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-7 * 10:03 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-43 * 10:02 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-7 * 10:01 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-redis-6 * 09:59 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-redis-6 * 09:58 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-43 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-cumin-1 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-cumin-1 * 09:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-k8s-haproxy-5 * 09:50 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-k8s-haproxy-5 * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-harbor-1 * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-harbor-1 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-107.tools.eqiad1.wikimedia.cloud to the cluster * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-prometheus-6 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-prometheus-6 * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetserver-01 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetserver-01 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-puppetdb-2 * 09:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-puppetdb-2 * 09:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:30 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-106.tools.eqiad1.wikimedia.cloud to the cluster * 09:30 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-mail-4 * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-legacy-redirector-2 * 09:28 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-legacy-redirector-2 * 09:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-imagebuilder-2 * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-imagebuilder-2 * 09:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-proxy-8 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-services-05 * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-services-05 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-package-builder-04 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-docker-registry-8 * 09:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:19 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-docker-registry-8 * 09:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.migrate_server_to_ovs (exit_code=0) for server tools-checker-5 * 09:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 09:18 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-105.tools.eqiad1.wikimedia.cloud to the cluster * 09:18 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_server_to_ovs for server tools-checker-5 * 09:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 09:08 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker role in the tools cluster * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster === 2024-06-20 === * 13:09 arturo: re-deploy kyverno [[phab:T368044|T368044]] * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 09:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-19 === * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 10:11 arturo: merging k8s HAproxy change https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047113 * 04:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 04:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 04:16 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 04:15 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-14 === * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:15 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:14 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 07:35 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 07:35 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-12 === * 19:41 bd808: Rebuilding all shared Docker containers. This will among other things apply the fix for [[phab:T367345|T367345]]. * 17:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 17:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 17:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 16:52 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 15:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno * 15:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:52 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:52 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 13:45 taavi: hard reboot tools-k8s-control-7 * 12:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-11 === * 17:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all NFS workers * 16:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 16:38 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 16:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 15:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 15:50 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all NFS workers * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all NFS workers * 11:35 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:35 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:57 dcaro: cleaning old maintain-kubeusers configmaps * 10:45 dcaro: cleaning up old resourcequotas === 2024-06-10 === * 09:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component kyverno * 09:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component kyverno === 2024-06-07 === * 10:10 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:09 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 09:59 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-06-06 === * 14:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:06 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-06-05 === * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 13:27 dcaro: deploying toolforge-webservice 0.103.6 * 12:58 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 12:58 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 08:44 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-13 * 08:41 dcaro: deploying toolforge-jobs-framework-cli 16.0.10 on tools-bastion-12 === 2024-06-04 === * 16:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:47 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 12:47 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:12 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 08:12 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-06-03 === * 16:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:05 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:04 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 15:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:16 wmbot~arturo@nostromo: END (PASS) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=0) * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-reports-controller:v1.10.7 * 10:15 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-cleanup-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-background-controller:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyvernopre:v1.10.7 * 10:14 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:14 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 10:13 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 10:13 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 10:13 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:37 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:37 wmbot~arturo@nostromo: Updating container image docker-registry.tools.wmflabs.org/toolforge-kyverno-kyverno:v1.10.7 * 09:37 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:29 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:29 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:29 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 09:29 wmbot~arturo@nostromo: END (FAIL) - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry (exit_code=99) * 09:28 wmbot~arturo@nostromo: START - Cookbook wmcs.toolforge.k8s.kyverno.copy_images_to_registry * 09:13 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 09:13 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 08:43 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 08:43 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission === 2024-05-29 === * 16:14 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:13 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 02:59 wmbot~raymond@ubuntu: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component envvars-api * 02:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-28 === * 10:44 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:44 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-27 === * 15:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:22 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 09:21 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-25 === * 21:33 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 21:32 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 20:38 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 20:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-23 === * 13:22 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:21 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-05-22 === * 16:36 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 16:36 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 === 2024-05-15 === * 14:17 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:16 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:11 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 14:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 ([[phab:T364822|T364822]]) * 10:26 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:26 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-05-14 === * 13:28 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 13:28 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 07:48 dcaro: draining tools-k8s-worker-nfs-9 as it's stuck on IO * 07:48 dcaro@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=99) for node tools-k8s-worker-nfs-9 * 07:48 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-9 === 2024-05-07 === * 16:23 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-05-06 === * 12:05 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 12:04 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 08:24 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 07:24 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 07:23 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api === 2024-05-05 === * 07:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-nginx * 07:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-nginx === 2024-05-03 === * 15:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-30 === * 10:56 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:55 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-26 === * 08:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-25 === * 12:57 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:57 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:48 taavi: update pywikibot script image to v9.1.0 [[phab:T363132|T363132]] === 2024-04-24 === * 15:30 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:29 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-04-18 === * 09:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 09:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-17 === * 20:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-50 * 20:48 andrewbogott: In response to stuck processes (NFS?), running sudo cookbook wmcs.toolforge.k8s.reboot --hostname-list tools-k8s-worker-nfs-50 --cluster-name tools * 20:48 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-50 * 15:21 dcaro: swapped login.toolforge.org to point to tools-bastion-13 * 10:48 dcaro: rebooting tools-k8s-worker-nfs-1 === 2024-04-16 === * 11:08 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-1 * 11:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-1 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'python3-toolforge-weld' version '1.5.0' * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'python3-toolforge-weld' version '1.5.0' === 2024-04-15 === * 20:34 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 20:33 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 18:28 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 18:27 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 14:15 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:15 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 13:42 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 13:38 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 13:38 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:03 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:03 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 10:59 dcaro@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:59 dcaro@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 09:02 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-12 === * 10:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-admission * 10:14 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-admission * 09:35 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component volume-admission * 09:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component volume-admission * 09:27 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 09:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 01:19 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 01:18 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 01:18 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component ingress-admission * 01:17 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component calico * 01:17 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component api-gateway * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component ingress-admission * 01:16 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component api-gateway * 01:15 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component registry-admission * 01:14 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component registry-admission * 01:13 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 01:12 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 01:11 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-04-11 === * 08:42 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 08:41 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-04-09 === * 17:21 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 17:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 17:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 17:03 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:23 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:22 wmbot~dcaro@urcuchillay: END (FAIL) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=99) * 14:22 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 14:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:11 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:43 dcaro: deployed builds-builder 0.0.94 and removed builds-admission * 13:39 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 13:38 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 12:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:21 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:19 dcaro: deploying toolforge-jobs-cli 16.0.6 === 2024-04-08 === * 16:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=0) * 16:24 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:21 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 16:09 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_etcd_node (exit_code=99) * 16:09 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_etcd_node * 15:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:49 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:32 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:32 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=0) * 14:16 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 14:14 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-21 * 14:13 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-21 * 13:56 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:54 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-56 * 13:53 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:52 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-56 * 13:51 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:49 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:49 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:47 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:45 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:43 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:40 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:37 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:37 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:35 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:32 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 13:31 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:29 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:29 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 13:28 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 13:24 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:19 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 13:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_etcd_node (exit_code=99) * 13:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_etcd_node ([[phab:T349207|T349207]]) * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:26 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 08:55 dcaro_: deploy toolforge-jobs-framework-cli 16.0.5 === 2024-04-05 === * 12:15 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:15 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-04-03 === * 15:01 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 15:00 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:59 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:59 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:58 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:58 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:57 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:57 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:49 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:49 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:37 wmbot~raymond@ubuntu: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:37 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 11:24 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:24 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:23 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-06 * 11:21 wmbot~taavi@runko: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-06 * 09:45 taavi: rebuilding prebuild images for [[phab:T361457|T361457]] === 2024-04-02 === * 12:39 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-db-2 ([[phab:T344717|T344717]]) * 12:38 fnegri@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-db-2 ([[phab:T344717|T344717]]) * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-05 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-05 === 2024-03-28 === * 14:27 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-proxy-05 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-proxy-05 * 13:45 taavi: migrating toolforge.org floating IP from tools-proxy-06 to tools-proxy-7 [[phab:T361223|T361223]] * 13:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:30 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 13:25 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-proxy' * 13:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-proxy' * 12:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-registry-06 * 12:12 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-registry-06 * 11:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 11:02 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' === 2024-03-27 === * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance toolserver-proxy-01 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance toolserver-proxy-01 === 2024-03-26 === * 16:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:41 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-docker-registry-7.tools.eqiad1.wikimedia.cloud * 16:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-docker-registry' * 16:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-docker-registry' * 12:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:54 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-13.tools.eqiad1.wikimedia.cloud * 12:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-bastion' * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-bastion' * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-sgebastion-11 * 12:43 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-sgebastion-11 * 10:24 taavi: point toolserver.org DNS to tools-legacy-redirector-2 [[phab:T311909|T311909]] === 2024-03-25 === * 18:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-legacy-redirector * 18:23 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-legacy-redirector * 14:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:27 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:20 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:19 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud * 14:18 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-legacy-redirector-2.tools.eqiad1.wikimedia.cloud === 2024-03-22 === * 11:43 dcaro: restarted sssd on tools-prometheus-6 as it was stopped (error) === 2024-03-21 === * 15:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-4 * 15:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-4 * 15:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=0) for node tools-k8s-haproxy-3 * 15:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node tools-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_haproxy_node (exit_code=99) for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_haproxy_node for node toolsbeta-k8s-haproxy-3 * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 15:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node * 12:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_haproxy_node (exit_code=0) * 12:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_haproxy_node === 2024-03-20 === * 13:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-checker-04 * 13:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-checker-04 * 12:30 taavi: move checker service address to tools-checker-5 * 11:24 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:24 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:40 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:39 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-checker-5.tools.eqiad1.wikimedia.cloud * 10:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=0) with prefix 'tools-checker' * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' * 10:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 10:32 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 10:22 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.create_instance_with_prefix (exit_code=99) with prefix 'tools-checker' * 10:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.create_instance_with_prefix with prefix 'tools-checker' === 2024-03-19 === * 21:28 taavi: kick off full container image rebuild for https://gerrit.wikimedia.org/r/1012753 (python3 backwards compat in lighttpd images) and https://gerrit.wikimedia.org/r/1010690 (add procps to base images) * 11:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-static-14 * 11:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-static-14 * 11:19 taavi: point dev.toolforge.org to tools-bastion-12 [[phab:T314665|T314665]] * 10:26 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:25 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 09:38 dcaro: pushed docker-registry.tools.wmflabs.org/cloud-cicd-py311bookworm-tox:latest and docker-registry.tools.wmflabs.org/cloud-cicd-debian-builder-bookworm:2024-03-24.1 ([[phab:T360405|T360405]]) === 2024-03-18 === * 13:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-ingress-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 13:14 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 taavi: restart harbor services after docker service restart * 13:13 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-104 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-103 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:12 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-102 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:03 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-56 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:02 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-55 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-54 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 13:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-53 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:59 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-52 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:58 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-51 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:57 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-50 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:56 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:55 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-49 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-48 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-47 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-46 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-45 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-44 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-43 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-42 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-41 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:44 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:36 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-40 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-39 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:34 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-filesystemtest-1 * 12:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-38 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-filesystemtest-1 * 12:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-37 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-36 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-35 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:29 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-34 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:28 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-33 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:27 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-32 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:26 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-31 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:25 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-30 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:24 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-29 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-28 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:22 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-27 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:21 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-26 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:20 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-25 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:19 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-24 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:18 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:15 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-23 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:15 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-4.tools.eqiad1.wikimedia.cloud * 12:11 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-22 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:11 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:05 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-21 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:04 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:03 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 12:01 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 12:00 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-acme-chief-3.tools.eqiad1.wikimedia.cloud * 11:56 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:55 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-20 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-19 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:53 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-18 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:52 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-17 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:51 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-16 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-15 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:49 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-14 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:48 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-13 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-12 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-11 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:45 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-10 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:43 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-9 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-8 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:41 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-7 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:40 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-6 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:39 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-5 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:33 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-4 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:32 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-3 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:31 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-2 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:30 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-worker-nfs-1 from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:23 taavi: point tools-static proxy to tools-static-15 (bookworm) [[phab:T311913|T311913]] * 11:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-static-15.tools.eqiad1.wikimedia.cloud * 11:17 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-9 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:13 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:08 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-8 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 11:01 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 11:00 aborrero@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=97) for component jobs-api * 11:00 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:00 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=0) for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-control-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.worker.upgrade (exit_code=99) for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:53 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.upgrade for node tools-k8s-7 from 1.23.17 to 1.24.17 ([[phab:T359638|T359638]]) * 10:47 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.prepare_upgrade (exit_code=0) for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.prepare_upgrade for cluster tools upgrade from 1.23.17 to 1.24.17 ([[phab:T307651|T307651]]) * 10:04 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 10:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-bastion-12.tools.eqiad1.wikimedia.cloud * 09:27 taavi: deleted shutdown grid engine VMs [[phab:T314664|T314664]] === 2024-03-15 === * 10:50 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:50 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-14 === * 17:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'misctools' version '1.48' * 17:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'misctools' version '1.48' * 15:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-docker-imagebuilder-01 * 15:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 15:10 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.remove_instance (exit_code=99) for instance tools-docker-imagebuilder-01 * 15:09 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-docker-imagebuilder-01 * 11:02 taavi: stop grid related VMs [[phab:T314664|T314664]] * 11:01 taavi: disable grid access for remaining tools still running on the grid [[phab:T314664|T314664]] === 2024-03-13 === * 19:21 andrewbogott: shutting down old puppet infra: tools-puppetmaster-02 and tools-puppetdb-1. These can be deleted in a week or two presuming everything remains stable. === 2024-03-12 === * 12:38 taavi: hard reboot tools-prometheus-6 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-11 === * 16:46 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 16:46 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:20 arturo: cached registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.6.0 as docker-registry.tools.wmflabs.org/kube-state-metrics:v2.6.0 in the docker registry for [[phab:T359798|T359798]] === 2024-03-09 === * 12:48 taavi: hard reboot tools-sgebastion-10 due to stuck NFS procs === 2024-03-08 === * 12:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 12:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-03-07 === * 14:33 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 13:42 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 13:41 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-03-06 === * 10:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_grid_node (exit_code=1) for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-32 * 10:34 taavi: rebuilding all docker images for https://gerrit.wikimedia.org/r/c/operations/docker-images/toollabs-images/+/1005952 ([[phab:T293552|T293552]]) + normal package updates * 09:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 09:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:42 taavi: reboot tools-sgeexec-10-20, -21, -23, sgeweblight-10-32 due to stuck nfs procs === 2024-03-05 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:11 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 16:09 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 16:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) * 16:07 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase * 16:06 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.openstack.quota_increase (exit_code=97) ([[phab:T357901|T357901]]) * 16:06 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 16:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=99) on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-imagebuilder-2.tools.eqiad1.wikimedia.cloud === 2024-03-04 === * 17:56 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 17:56 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 16:57 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:43 taavi: reboot tools-sgegrid-shadow due to high number of procs in D state === 2024-03-03 === * 10:38 dcaro: reboot tools-k8s-worker-nfs-55 got nfs lockup (logrotate in D state) === 2024-03-01 === * 21:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 21:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api === 2024-02-29 === * 14:36 dcaro: deploy webservice 0.103.3 === 2024-02-28 === * 11:57 dcaro: deploy tools-webservice 0.103.2 with probes ([[phab:T341919|T341919]]) * 00:46 bd808@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 00:46 bd808@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-26 === * 09:54 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 09:54 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 09:35 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:35 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) === 2024-02-23 === * 14:19 taavi: remove isc-dhcp-server (server, not client) from tools-db-2 * 13:32 taavi: remove toolschecker alerts for grid engine jobs [[phab:T358333|T358333]] === 2024-02-22 === * 14:26 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 14:26 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:24 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:24 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:17 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-api * 14:17 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 14:07 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:07 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:03 sstefanova@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component envvars-api * 14:03 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 11:23 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) ([[phab:T284656|T284656]]) * 11:23 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node ([[phab:T284656|T284656]]) * 11:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 11:15 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-104.tools.eqiad1.wikimedia.cloud to the cluster * 11:06 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 10:52 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 09:39 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:39 aborrero@cloudcumin1001: Added a new k8s control tools-k8s-control-8.tools.eqiad1.wikimedia.cloud to the cluster * 09:29 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster ([[phab:T284656|T284656]]) * 08:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-51 * 08:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-38 * 08:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-38 * 08:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-nfs-25 * 08:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-nfs-25 === 2024-02-21 === * 17:07 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 17:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 15:48 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component envvars-api * 15:48 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component envvars-api * 14:41 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 14:21 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 14:20 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-control-4 * 09:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-control-4 * 09:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a control role in the tools cluster * 09:20 taavi@cloudcumin1001: Added a new k8s control tools-k8s-control-7.tools.eqiad1.wikimedia.cloud to the cluster * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a control role in the tools cluster === 2024-02-20 === * 16:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 16:12 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-103.tools.eqiad1.wikimedia.cloud to the cluster * 16:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.worker.drain (exit_code=0) for node tools-k8s-worker-102 * 16:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.worker.drain for node tools-k8s-worker-102 * 16:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-101 * 15:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-101 * 15:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:48 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-102 * 15:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-102 * 15:38 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker role in the tools cluster * 15:38 taavi@cloudcumin1001: Added a new k8s worker tools-k8s-worker-102.tools.eqiad1.wikimedia.cloud to the cluster * 15:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 15:23 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 15:21 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud * 12:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-56.tools.eqiad1.wikimedia.cloud to the cluster * 12:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-100 * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-100 * 12:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:40 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-55.tools.eqiad1.wikimedia.cloud to the cluster * 12:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-99 * 12:29 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:29 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-54.tools.eqiad1.wikimedia.cloud to the cluster * 12:20 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:19 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-98 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-98 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-53.tools.eqiad1.wikimedia.cloud to the cluster * 12:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-97 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-97 * 11:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-52.tools.eqiad1.wikimedia.cloud to the cluster * 11:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-96 * 11:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-96 * 11:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-51.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:26 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-50.tools.eqiad1.wikimedia.cloud to the cluster * 11:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:16 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-49.tools.eqiad1.wikimedia.cloud to the cluster * 11:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-95 * 11:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-95 * 10:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-94 * 10:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-93 * 10:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-48.tools.eqiad1.wikimedia.cloud to the cluster * 10:45 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-92 * 10:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-92 * 09:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-6 * 09:52 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-6 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-9.tools.eqiad1.wikimedia.cloud to the cluster * 09:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-47.tools.eqiad1.wikimedia.cloud to the cluster * 09:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 09:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-91 * 09:29 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-91 * 09:15 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:15 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-46.tools.eqiad1.wikimedia.cloud to the cluster * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:02 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-90 * 08:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-90 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:57 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-45.tools.eqiad1.wikimedia.cloud to the cluster * 08:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-89 * 08:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-44.tools.eqiad1.wikimedia.cloud to the cluster * 08:38 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-88 * 08:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-88 === 2024-02-19 === * 19:04 wmbot~raymond@ubuntu: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 19:03 wmbot~raymond@ubuntu: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 13:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-5 * 13:16 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-5 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-43.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-87 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-87 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-42.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-86 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-41.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T357901|T357901]]) * 12:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:32 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud * 12:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:20 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-85 * 12:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-85 * 12:18 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:18 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-40.tools.eqiad1.wikimedia.cloud to the cluster * 12:08 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-84 * 12:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-84 * 12:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:04 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-39.tools.eqiad1.wikimedia.cloud to the cluster * 11:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-83 * 11:53 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-83 * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:50 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-38.tools.eqiad1.wikimedia.cloud to the cluster * 11:40 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:40 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-82 * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:39 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-37.tools.eqiad1.wikimedia.cloud to the cluster * 11:28 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:28 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-81 * 11:27 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-81 * 09:03 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 09:03 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 08:57 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-16 === * 15:28 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 15:27 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 12:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 12:21 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-8.tools.eqiad1.wikimedia.cloud to the cluster * 12:14 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 10:37 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 10:32 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:32 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 10:31 wmbot~dcaro@urcuchillay: END (ERROR) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=255) * 10:31 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:59 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-36.tools.eqiad1.wikimedia.cloud to the cluster * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:49 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-80 * 09:49 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-80 * 09:45 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:45 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-35.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:35 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-79 * 09:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-79 * 09:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-34.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:06 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-78 * 09:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-33.tools.eqiad1.wikimedia.cloud to the cluster * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:55 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-77 * 08:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-77 === 2024-02-15 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-ingress-4 * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-ingress-4 * 13:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:02 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-32.tools.eqiad1.wikimedia.cloud to the cluster * 12:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-76 * 12:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-76 * 12:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:44 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-31.tools.eqiad1.wikimedia.cloud to the cluster * 12:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-75 * 12:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-75 * 11:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a ingress role in the tools cluster * 11:37 taavi@cloudcumin1001: Added a new k8s ingress tools-k8s-ingress-7.tools.eqiad1.wikimedia.cloud to the cluster * 11:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster * 11:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-ingress-7 * 11:29 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a ingress role in the tools cluster * 11:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a ingress role in the tools cluster === 2024-02-14 === * 19:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-17, tools-sgeweblight-10-30 * 16:35 taavi: kill jobs user 'wikishizhao' is running directly on the grid per https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules #3 * 16:30 taavi: reboot tools-sgeexec-10-23 due to high load * 09:14 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud * 09:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0) * 09:07 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.openstack.cloudvirt.vm_console * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:07 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-30.tools.eqiad1.wikimedia.cloud to the cluster * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-74 * 08:55 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-74 * 08:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:54 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-29.tools.eqiad1.wikimedia.cloud to the cluster * 08:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:44 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-73 * 08:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-28.tools.eqiad1.wikimedia.cloud to the cluster * 08:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-72 * 08:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-27.tools.eqiad1.wikimedia.cloud to the cluster * 08:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-71 * 08:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-71 * 08:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-26.tools.eqiad1.wikimedia.cloud to the cluster * 08:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:08 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-70 * 08:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-70 * 08:05 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 08:05 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-25.tools.eqiad1.wikimedia.cloud to the cluster * 07:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:54 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-69 * 07:54 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-69 * 07:53 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 07:53 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-24.tools.eqiad1.wikimedia.cloud to the cluster * 07:44 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 07:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-68 * 07:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-68 === 2024-02-13 === * 15:42 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-67 * 15:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-23.tools.eqiad1.wikimedia.cloud to the cluster * 15:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-66 * 15:30 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 15:30 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-22.tools.eqiad1.wikimedia.cloud to the cluster * 15:19 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 15:17 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-65 * 15:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-65 * 09:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:36 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-21.tools.eqiad1.wikimedia.cloud to the cluster * 09:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-64 * 09:25 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-64 === 2024-02-12 === * 14:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:58 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-20.tools.eqiad1.wikimedia.cloud to the cluster * 14:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-62 * 14:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 14:47 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-19.tools.eqiad1.wikimedia.cloud to the cluster * 14:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 14:26 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-61 * 14:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-61 * 13:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-60 * 13:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-60 * 13:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:43 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-18.tools.eqiad1.wikimedia.cloud to the cluster * 13:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-59 * 13:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-58 * 13:32 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-58 * 13:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:22 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-17.tools.eqiad1.wikimedia.cloud to the cluster * 13:12 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-57 * 13:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-56 * 13:09 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 13:09 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-16.tools.eqiad1.wikimedia.cloud to the cluster * 12:59 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:59 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-55 * 12:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-54 * 12:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-54 * 12:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-15.tools.eqiad1.wikimedia.cloud to the cluster * 12:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-15 * 12:45 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-15 * 12:44 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 12:37 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-53 * 12:36 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-52 * 12:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-52 * 10:51 sstefanova@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-api * 10:50 sstefanova@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-api * 10:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 10:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers === 2024-02-11 === * 11:39 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 11:39 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-02-09 === * 18:03 andrewbogott: updated the default security group, removing the 0.0.0.0/0 rule allowing port 22 access everywhere, replaced it with a 172.16.0.0/21 rule * 13:06 taavi: reboot tools-sgecron-2 due to high load * 10:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component image-config * 10:34 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component image-config * 09:56 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:56 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-14.tools.eqiad1.wikimedia.cloud to the cluster * 09:47 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:47 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-51 * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-50 * 09:46 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-50 * 08:56 dcaro: restart tools-k8s-worker-50 due to D some stuck processes === 2024-02-08 === * 13:03 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 13:03 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors * 09:46 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:46 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-13.tools.eqiad1.wikimedia.cloud to the cluster * 09:35 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:34 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-49 * 09:33 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-48 * 09:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-48 * 09:32 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:32 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-12.tools.eqiad1.wikimedia.cloud to the cluster * 09:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-47 * 09:22 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-46 * 09:21 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:21 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-11.tools.eqiad1.wikimedia.cloud to the cluster * 09:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 09:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-45 * 09:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-45 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-44 * 09:10 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 09:10 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-10.tools.eqiad1.wikimedia.cloud to the cluster * 09:00 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:59 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 08:58 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-43 * 08:57 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-42 * 08:56 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-42 === 2024-02-07 === * 21:33 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all workers * 18:00 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:58 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-9 * 17:58 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-9 * 17:24 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:23 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:05 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 17:05 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:03 taavi@cloudcumin1001: END (ERROR) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=97) for all workers * 17:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers * 17:01 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for all workers * 16:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all workers === 2024-02-06 === * 13:09 aborrero@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for all nodes ([[phab:T356507|T356507]]) * 11:50 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 11:50 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 11:16 aborrero@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.reboot for all nodes ([[phab:T356507|T356507]]) === 2024-01-31 === * 14:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 14:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-30 === * 19:24 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:24 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-9.tools.eqiad1.wikimedia.cloud to the cluster * 19:17 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:16 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-9 * 19:16 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 19:13 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 19:12 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-8.tools.eqiad1.wikimedia.cloud to the cluster * 19:04 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 19:04 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 19:03 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:51 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:48 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance tools-k8s-worker-nfs-8 * 18:47 taavi@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance tools-k8s-worker-nfs-8 * 18:46 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 18:42 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 18:41 taavi@cloudcumin1001: Added a new k8s worker-nfs tools-k8s-worker-nfs-7.tools.eqiad1.wikimedia.cloud to the cluster * 18:33 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 18:29 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-41 * 18:29 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-41 * 18:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-40 * 18:23 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-40 * 18:22 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-39 * 18:22 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-39 * 18:18 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-38 * 18:17 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-38 * 18:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-37 * 18:08 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-37 * 15:16 dcaro: restart harbor now that the db is clean ([[phab:T356037|T356037]]) * 15:14 dcaro: restart harbor now that the db is clean ([[phab:T3543|T3543]]) * 13:08 taavi: create no-op DMARC record [[phab:T354112|T354112]] * 12:39 dcaro: rebuilding all the toolforge images ([[phab:T354320|T354320]]) * 10:16 dcaro: restarting harbor and flushing redis to regenerate cache data ([[phab:T356037|T356037]]) * 09:33 dcaro: cleaning up old schedules on harbor ([[phab:T356037|T356037]]) === 2024-01-29 === * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=99) for host tools-k8s-worker-36 * 19:46 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-36 * 14:36 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-mail-4.tools.eqiad1.wikimedia.cloud * 14:34 wmbot~taavi@runko: START - Cookbook wmcs.vps.refresh_puppet_certs on tools-mail-4.tools.eqiad1.wikimedia.cloud * 12:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 12:06 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-6.tools.eqiad1.wikimedia.cloud to the cluster * 11:55 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 11:51 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:37 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-5.tools.eqiad1.wikimedia.cloud to the cluster * 11:26 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:23 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 11:22 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-4.tools.eqiad1.wikimedia.cloud to the cluster * 11:12 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:12 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-35 * 11:10 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-34 * 11:09 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-33 * 11:07 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-33 * 11:06 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.remove_k8s_node (exit_code=0) for host tools-k8s-worker-32 * 11:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-32 * 11:01 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-31 * 10:59 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.remove_k8s_node for host tools-k8s-worker-30 * 10:57 wmbot~taavi@runko: END (FAIL) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=99) for a worker-nfs role in the tools cluster * 10:56 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: END (PASS) - Cookbook wmcs.toolforge.add_k8s_node (exit_code=0) for a worker-nfs role in the tools cluster * 10:51 wmbot~taavi@runko: Added a new k8s worker-nfs tools-k8s-worker-nfs-3.tools.eqiad1.wikimedia.cloud to the cluster * 10:46 blancadesal: increased harbor quota for wd-shex-infer to 2GiB * 10:44 blancadesal: increased harbor quota for lucaswerkmeister-test to 2GiB * 10:31 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 10:31 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:31 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-26 === * 10:56 taavi: copy helmfile_0.144.0-1_all to bookworm-tools, bookworm-toolsbeta === 2024-01-25 === * 13:17 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 13:04 wmbot~taavi@runko: START - Cookbook wmcs.toolforge.add_k8s_node for a worker-nfs role in the tools cluster * 11:13 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:12 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-24 === * 09:54 dcaro: deploy toolforge-jobs-framework-cli 16.0.1 === 2024-01-23 === * 19:11 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 19:11 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component wmcs-k8s-metrics * 14:43 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component wmcs-k8s-metrics * 13:31 taavi: rebooting tools-sgeexec-10-21, tools-sgeexec-10-22 * 12:58 dcaro: deployed toolforge-envvars-cli 0.0.4 * 10:23 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 10:23 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-19 === * 15:40 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 15:40 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 12:11 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 12:10 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api === 2024-01-18 === * 12:24 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.add_k8s_node for a worker role in the tools cluster * 12:21 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeexec-10-17 === 2024-01-17 === * 18:16 dhinus: increase volume quotas for toolsdb [[phab:T344717|T344717]] * 18:14 fnegri@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) ([[phab:T344717|T344717]]) * 18:14 fnegri@cloudcumin1001: START - Cookbook wmcs.openstack.quota_increase ([[phab:T344717|T344717]]) * 14:34 wmbot~dcaro@urcuchillay: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:34 wmbot~dcaro@urcuchillay: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 08:56 taavi: update all pre-built docker images [[phab:T352886|T352886]] === 2024-01-15 === * 09:18 taavi: reboot stuck tools-k8s-worker-84 === 2024-01-12 === * 09:07 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.apt.copy_to_main_repo (exit_code=0) for package 'toolforge-builds-cli' version '0.0.12' * 09:07 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.apt.copy_to_main_repo for package 'toolforge-builds-cli' version '0.0.12' === 2024-01-11 === * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:12 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 15:14 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 15:13 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder === 2024-01-10 === * 22:02 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component maintain-kubeusers * 22:02 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component maintain-kubeusers * 09:17 taavi: reboot tools-k8s-worker-98 === 2024-01-09 === * 23:37 andrewbogott: restarting harbor-db in an attempt to reform harbor -- [[phab:T354714|T354714]] * 23:30 andrewbogott: rebooting tools-harbor-1 in a feeble attempt to get it to work (docker-compose can't restart it) * 23:12 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds-builder * 23:12 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 23:11 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=99) for component builds.builder * 23:11 andrew@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds.builder * 17:31 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 17:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:13 taavi: reboot tools-sgeexec-10-17 due to high load === 2024-01-08 === * 12:26 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.remove_grid_node for tools-sgeweblight-10-27, tools-sgeweblight-10-28 * 10:51 taavi@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component jobs-api * 10:51 taavi@cloudcumin1001: START - Cookbook wmcs.toolforge.k8s.component.deploy for component jobs-api * 10:17 taavi: reboot tools-sgeexec-10-21 === 2024-01-05 === * 14:55 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 14:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 11:56 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.toolforge.k8s.component.deploy (exit_code=0) for component builds-builder * 11:55 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.toolforge.k8s.component.deploy for component builds-builder * 10:29 fnegri@cloudcumin1001: END (PASS) - Cookbook wmcs.toolforge.grid.cleanup_queue_errors (exit_code=0) * 10:29 fnegri@cloudcumin1001: START - Cookbook wmcs.toolforge.grid.cleanup_queue_errors === 2024-01-04 === * 10:11 dcaro: deploy toolforge-envvars-cli 0.0.3 === 2024-01-03 === * 21:22 andrewbogott: truncating 200 logfiles to 5M on tools nfs * 21:17 andrewbogott: deleting many stray core dumps throughout nfs storage === 2024-01-02 === * 11:06 dcaro: restart toolsdb database to flush connections ([[phab:T354176|T354176]]) * 10:42 dcaro: flushed the redis db on tools-harbor-1 ([[phab:T354176|T354176]]) * 10:37 dcaro: hard reboot tools-harbor-1 * 10:13 dhinus: hard reboot tools-harbor-1 === 2024-01-01 === * 15:55 andrewbogott: rebooting tools-harbor-1, [[phab:T354151|T354151]] ==Archives== * [[Nova Resource:Tools/SAL/Archive 1|Archive 1]] (2013-2014) * [[Nova Resource:Tools/SAL/Archive 2|Archive 2]] (2015-2017) * [[Nova Resource:Tools/SAL/Archive 3|Archive 3]] (2018-2019) * [[Nova Resource:Tools/SAL/Archive 4|Archive 4]] (2020-2021) * [[Nova Resource:Tools/SAL/Archive 5|Archive 5]] (2022-2023) </noinclude> {{SAL|Project Name=tools}} <noinclude>[[Category:SAL]]</noinclude> e44tv7sppxpfkpdnxos6m4s18wuxbbq Deployments 0 4108 2414261 2414083 2026-05-15T14:43:27Z ScheduleDeploymentBot 37566 Add [[gerrit:1287895]] to Monday, May 18 UTC afternoon backport window 2414261 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[:m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[:mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[:m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the '''backport scheduling tool''' to more easily edit this page: <div style="text-align: center; margin: 1em 0">{{Clickable button 2|:toollabs:schedule-deployment|Schedule a backport|class=mw-ui-progressive}}</div> * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[:gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. **To create an one-off window, simply edit this page accordingly ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if you anticipate or are uncertain about noticeable impacts to database load, HTTP caching, or the introduction of new cookies. ** '''Announce''' deployments of major features to the community via [[:m:Tech/News/Next|Tech News]] and/or via other [[:mw:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[:phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[:mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of May 11== ==={{Deployment_day|date=2026-05-10}}=== {{Deployment calendar event card |when=2026-05-10 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-11}}=== {{Deployment calendar event card |when=2026-05-11 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1278704|title=WikiLambdaApi: update stream configuration|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285352|title=WikiLambdaApi instrument: Sets the custom schemaID|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285406|title=editSaves: getExperiment returns a promise now|status=}} - {{phabricator|T425785}} {{ircnick|dyepezg|Daniel Yepez Garces}} {{deploy|type=config|gerrit=1283048|title=Enabling RSS extension for cowikimedia chapter|status=}} - {{phabricator|T425440}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|yerdua_wmde|yerdua_wmde}} {{deploy|type=config|gerrit=1270482|title=Enable and configure WikiProjects prototype on WikiData beta|status=}} - {{phabricator|T421850}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1284900|title=Completely disable MediaWiki page patrolling functions on German Wikipedia|status=}} - {{phabricator|T316393}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.47.0-wmf.1|gerrit=1285460|title=Prevent username registration if the username previously existed|status=}} - {{phabricator|T196386}} {{deploy|type=1.47.0-wmf.1|gerrit=1285461|title=Prevent username registration if the username previously existed (v2)|status=}} - {{phabricator|T196386}} {{deploy|type=config|gerrit=1285448|title=Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes|status=}} - {{phabricator|T196386}} {{deploy|type=1.47.0-wmf.1|gerrit=1285462|title=API: Introduce list=globalusers|status=}} - {{phabricator|T261752}} {{deploy|type=1.47.0-wmf.1|gerrit=1285761|title=list=globalusers: Avoid querying group permissions with empty group list|status=}} - {{phabricator|T425859}} {{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1278704|title=WikiLambdaApi: update stream configuration|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285352|title=WikiLambdaApi instrument: Sets the custom schemaID|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285406|title=editSaves: getExperiment returns a promise now|status=}} - {{phabricator|T425785}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-11 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-11 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-11 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=1.47.0-wmf.1|gerrit=1285743|title=loggedOutWarning: set lastEditor used earlier|status=}} - {{phabricator|T425604}} {{ircnick|jan_drewniak|Jan Drewniak}} * {{gerrit|1285848}} [config] Portal banner deploy {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-11 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-11 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.2</code> }} {{Deployment calendar event card |when=2026-05-11 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.2</code> to testwikis }} {{Deployment calendar event card |when=2026-05-11 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-11 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-12}}=== {{Deployment calendar event card |when=2026-05-12 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1284628|title=cirrus: use a keywork tokenizer for the plain field for autocomplete|status=}} - {{phabricator|T420427}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1|1.47.0-wmf.1}} * group0 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-12 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-12 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-12 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1286334|title=ArticleGuidance: set sparql endpoint|status=}} - {{phabricator|T425389}} {{ircnick|yerdua_wmde|yerdua_wmde}} {{deploy|type=1.47.0-wmf.2|gerrit=1286336|title=Keep all long, non-wrapping values inside parent element|status=}} - {{phabricator|T425176}} {{ircnick|ottomata|ottomata}} {{deploy|type=1.47.0-wmf.2|gerrit=1286341|title=page_change - add revision.revert info|status=}} {{ircnick|atsukoito|atsukoito}} {{deploy|type=config|gerrit=1283711|title=translate: add opensearch-ttmserver-test|status=}} - {{phabricator|T425377}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-12 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-12 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-12 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-12 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-12 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1|1.47.0-wmf.1}} * group0 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-12 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|alexsanford|alexsanford}} {{deploy|type=config|gerrit=1285905|title=Enforce 2FA requirements for phase 2 groups|status=}} - {{phabricator|T423119}} {{deploy|type=config|gerrit=1286469|title=Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3|status=}} - {{phabricator|T423119}} {{phabricator|T423120}} {{ircnick|dbrant|Dmitry}} {{deploy|type=config|gerrit=1285930|title=docroot: Add "get_login_creds" permission to Android app.|status=}} - {{phabricator|T426010}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1285482|title=Allow svwiki bureaucrats to remove sysop rights|status=}} - {{phabricator|T425806}} {{ircnick|VadymTS1|VadymTS1}} {{deploy|type=config|gerrit=1283048|title=Enabling RSS extension for cowikimedia chapter|status=}} - {{phabricator|T425440}} {{deploy|type=config|gerrit=1286390|title=Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary|status=}} - {{phabricator|T425332}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=1.47.0-wmf.2|gerrit=1286484|title=Bump wikimedia/parsoid to 0.24.0-a3|status=}} - {{phabricator|T425981}} {{deploy|type=1.47.0-wmf.2|gerrit=1286485|title=Bump wikimedia/parsoid to 0.24.0-a3|status=}} - {{phabricator|T425981}} {{deploy|type=1.47.0-wmf.2|gerrit=1286488|title=Disable unit tests that fail with new vendor release|status=}} {{deploy|type=1.47.0-wmf.2|gerrit=1286489|title=Skip ContentHolderTest that fails with new vendor release|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-12 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-13}}=== {{Deployment calendar event card |when=2026-05-13 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|atsukoito|atsukoito}} {{deploy|type=config|gerrit=1286371|title=translate: add opensearch-ttmserver-test|status=}} - {{phabricator|T425377}} {{ircnick|WMDE-Fisch|WMDE-Fisch}} {{deploy|type=config|gerrit=1286400|title=testwiki: Disable sub-ref's synthetic list defined refs on test wikis|status=}} - {{phabricator|T425967}} {{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1286277|title=Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1}} * group1 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-13 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-13 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=1.47.0-wmf.1|gerrit=1286359|title=Add configurable user-agent and sparql endpoint url|status=}} - {{phabricator|T425389}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1284900|title=Completely disable MediaWiki page patrolling functions on German Wikipedia|status=}} - {{phabricator|T316393}} {{ircnick|mfossati|mfossati}} {{deploy|type=1.47.0-wmf.2|gerrit=1286518|title=[Share Highlight] Exclude section edit links, footnotes from selection|status=}} - {{phabricator|T423658}} {{deploy|type=1.47.0-wmf.2|gerrit=1286838|title=Add robust color fallbacks for QuoteCard average-color styling|status=}} - {{phabricator|T425358}} {{deploy|type=1.47.0-wmf.2|gerrit=1286839|title=Fixed card width|status=}} - {{phabricator|T425710}} {{deploy|type=1.47.0-wmf.2|gerrit=1286844|title=Adjust image size to match fixed width|status=}} - {{phabricator|T425710}} {{deploy|type=1.47.0-wmf.2|gerrit=1286846|title=ShareHighlight: exclude browsers that don't support CSS has|status=}} - {{phabricator|T424873}} {{deploy|type=1.47.0-wmf.2|gerrit=1286847|title=Also skip instrumentation for unsupported browsers|status=}} - {{phabricator|T424873}} {{ircnick|Dragoniez|Dragoniez}} {{deploy|type=1.47.0-wmf.2|gerrit=1286890|title=ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries|status=}} - {{phabricator|T426033}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.47.0-wmf.1|gerrit=1286897|title=ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries|status=}} - {{phabricator|T426033}} {{deploy|type=1.47.0-wmf.1|gerrit=1286891|title=Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders|status=}} - {{phabricator|T425972}} {{deploy|type=1.47.0-wmf.2|gerrit=1286892|title=Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders|status=}} - {{phabricator|T425972}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.47.0-wmf.2|gerrit=1286917|title=WikiEditor: Populate user_groups in EditAttemptStep events|status=}} - {{phabricator|T424010}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-13 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-13 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1}} * group1 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-13 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|bpirkle|bpirkle}} {{deploy|type=config|gerrit=1286981|title=Revert "Add wikibase.v1 module to the sandbox were it is present"|status=}} - {{phabricator|T422403}} {{ircnick|ebernhardson|Erik B}} {{deploy|type=config|gerrit=1286997|title=Revert "cirrus: AB test query suggester variants"|status=}} - {{phabricator|T407432}} {{ircnick|Jdlrobson|Jdlrobson}} {{deploy|type=config|gerrit=1287006|title=Update small size for Swedish Wikipedia|status=}} - {{phabricator|T424910}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-13 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-13 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-14}}=== {{Deployment calendar event card |when=2026-05-14 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2}} * group2 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-14 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-14 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-14 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|annet|annet}} {{deploy|type=config|gerrit=1285913|title=Add ReadingLists Account Creation CTA campaign|status=}} - {{phabricator|T422169}} {{deploy|type=1.47.0-wmf.2|gerrit=1286327|title=WelcomeSurvey: Respect returnTo for campaigns skipping the survey|status=}} - {{phabricator|T422169}} {{ircnick|Nvdtn19|Nvdtn19}} {{deploy|type=config|gerrit=1216721|title=viwikivoyage: enable relatedarticle and pop-up|status=}} - {{phabricator|T405724}} {{ircnick|Krinkle|Krinkle}} {{deploy|type=config|gerrit=1269442|title=Enable wgTrackMediaRequestProvenance on remaining Wikipedias|status=}} - {{phabricator|T414338}} {{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1287043|title=Enable the Article Guidance experiment on simplewiki|status=}} - {{phabricator|T426278}} {{ircnick|mfossati|mfossati}} {{deploy|type=1.47.0-wmf.2|gerrit=1287363|title=Scale share-highlight card to fit small viewports|status=}} - {{phabricator|T426247}} {{ircnick|phuedx|Sam Smith}} {{deploy|type=1.47.0-wmf.2|gerrit=1287368|title=ext.wikimediaEvents: Add synth-aa-ncs-1 experiment|status=}} - {{phabricator|T419514}} {{ircnick|robertsky|robertsky}} {{deploy|type=config|gerrit=1287367|title=throttle rule for ESEAP Conference 2026 15-18 May 2026|status=}} - {{phabricator|T426295}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-14 08:00 SF |length=1 |window=Train log triage |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-14 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|Dreamy_Jazz|WBrown (WMF)}} * {{gerrit|1279281}} purge_securepoll: don't exclude private wikis {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-14 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-14 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-14 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2}} * group2 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-14 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|JSherman|Jsn.sherman}} {{deploy|type=config|gerrit=1192921|title=Enable AutoModerator on Italian Wikipedia|status=}} - {{phabricator|T405152}} {{deploy|type=config|gerrit=1286974|title=Enable AutoModerator on Albanian Wikipedia|status=}} - {{phabricator|T420450}} {{deploy|type=config|gerrit=1286975|title=Enable AutoModerator on Dutch Wikipedia|status=}} - {{phabricator|T425509}} {{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1287427|title=Simplewiki: include article wizard in AG experiment|status=}} - {{phabricator|T426278}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1287433|title=Restrict the changetags user right to bots and sysops on mediawiki.org|status=}} - {{phabricator|T355445}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1287002|title=Disable wgNewUserMessageOnAutoCreate on all WMF wikis|status=}} - {{phabricator|T426206}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start {{ircnick|jan_drewniak|Jan Drewniak}} {{deploy|type=config|gerrit=1287485|title=Disable Reading Lists survey for Wikipedias|status=}} - {{phabricator|T421776}} }} {{Deployment calendar event card |when=2026-05-14 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-15}}=== {{Deployment calendar event card |when=2026-05-15 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-15 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-16}}=== {{Deployment calendar event card |when=2026-05-16 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of May 18== ==={{Deployment_day|date=2026-05-17}}=== {{Deployment calendar event card |when=2026-05-17 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-18}}=== {{Deployment calendar event card |when=2026-05-18 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Daimona|Daimona}} {{deploy|type=1.47.0-wmf.2|gerrit=1287895|title=Store uncomputed references delta as null, not 0|status=}} - {{phabricator|T426002}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-18 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-18 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-18 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|alexsanford|Alex}}, {{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-18 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-18 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.3</code> }} {{Deployment calendar event card |when=2026-05-18 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.3</code> to testwikis }} {{Deployment calendar event card |when=2026-05-18 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-18 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-19}}=== {{Deployment calendar event card |when=2026-05-19 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2->1.47.0-wmf.3|1.47.0-wmf.2|1.47.0-wmf.2}} * group0 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-19 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-19 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-19 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-19 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-19 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-19 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-19 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-19 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-19 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-20}}=== {{Deployment calendar event card |when=2026-05-20 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.3|1.47.0-wmf.2->1.47.0-wmf.3|1.47.0-wmf.2}} * group1 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-20 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-20 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-20 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-20 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-20 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-20 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-21}}=== {{Deployment calendar event card |when=2026-05-21 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.3|1.47.0-wmf.3|1.47.0-wmf.2->1.47.0-wmf.3}} * group2 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-21 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-21 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-21 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-21 08:00 SF |length=1 |window=Train log triage |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-21 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-21 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-21 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-21 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-21 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-22}}=== {{Deployment calendar event card |when=2026-05-22 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-22 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-23}}=== {{Deployment calendar event card |when=2026-05-23 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} dvmd8t8x32wiotisu2gq16qw1pkhmsz 2414262 2414261 2026-05-15T14:46:05Z Daimona Eaytoy 11462 /* {{Deployment_day|date=2026-05-18}} */ +1 2414262 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[:m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[:mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[:m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the '''backport scheduling tool''' to more easily edit this page: <div style="text-align: center; margin: 1em 0">{{Clickable button 2|:toollabs:schedule-deployment|Schedule a backport|class=mw-ui-progressive}}</div> * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[:gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. **To create an one-off window, simply edit this page accordingly ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if you anticipate or are uncertain about noticeable impacts to database load, HTTP caching, or the introduction of new cookies. ** '''Announce''' deployments of major features to the community via [[:m:Tech/News/Next|Tech News]] and/or via other [[:mw:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[:phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[:mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of May 11== ==={{Deployment_day|date=2026-05-10}}=== {{Deployment calendar event card |when=2026-05-10 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-11}}=== {{Deployment calendar event card |when=2026-05-11 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1278704|title=WikiLambdaApi: update stream configuration|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285352|title=WikiLambdaApi instrument: Sets the custom schemaID|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285406|title=editSaves: getExperiment returns a promise now|status=}} - {{phabricator|T425785}} {{ircnick|dyepezg|Daniel Yepez Garces}} {{deploy|type=config|gerrit=1283048|title=Enabling RSS extension for cowikimedia chapter|status=}} - {{phabricator|T425440}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|yerdua_wmde|yerdua_wmde}} {{deploy|type=config|gerrit=1270482|title=Enable and configure WikiProjects prototype on WikiData beta|status=}} - {{phabricator|T421850}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1284900|title=Completely disable MediaWiki page patrolling functions on German Wikipedia|status=}} - {{phabricator|T316393}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.47.0-wmf.1|gerrit=1285460|title=Prevent username registration if the username previously existed|status=}} - {{phabricator|T196386}} {{deploy|type=1.47.0-wmf.1|gerrit=1285461|title=Prevent username registration if the username previously existed (v2)|status=}} - {{phabricator|T196386}} {{deploy|type=config|gerrit=1285448|title=Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes|status=}} - {{phabricator|T196386}} {{deploy|type=1.47.0-wmf.1|gerrit=1285462|title=API: Introduce list=globalusers|status=}} - {{phabricator|T261752}} {{deploy|type=1.47.0-wmf.1|gerrit=1285761|title=list=globalusers: Avoid querying group permissions with empty group list|status=}} - {{phabricator|T425859}} {{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1278704|title=WikiLambdaApi: update stream configuration|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285352|title=WikiLambdaApi instrument: Sets the custom schemaID|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285406|title=editSaves: getExperiment returns a promise now|status=}} - {{phabricator|T425785}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-11 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-11 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-11 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=1.47.0-wmf.1|gerrit=1285743|title=loggedOutWarning: set lastEditor used earlier|status=}} - {{phabricator|T425604}} {{ircnick|jan_drewniak|Jan Drewniak}} * {{gerrit|1285848}} [config] Portal banner deploy {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-11 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-11 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.2</code> }} {{Deployment calendar event card |when=2026-05-11 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.2</code> to testwikis }} {{Deployment calendar event card |when=2026-05-11 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-11 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-12}}=== {{Deployment calendar event card |when=2026-05-12 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1284628|title=cirrus: use a keywork tokenizer for the plain field for autocomplete|status=}} - {{phabricator|T420427}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1|1.47.0-wmf.1}} * group0 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-12 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-12 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-12 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1286334|title=ArticleGuidance: set sparql endpoint|status=}} - {{phabricator|T425389}} {{ircnick|yerdua_wmde|yerdua_wmde}} {{deploy|type=1.47.0-wmf.2|gerrit=1286336|title=Keep all long, non-wrapping values inside parent element|status=}} - {{phabricator|T425176}} {{ircnick|ottomata|ottomata}} {{deploy|type=1.47.0-wmf.2|gerrit=1286341|title=page_change - add revision.revert info|status=}} {{ircnick|atsukoito|atsukoito}} {{deploy|type=config|gerrit=1283711|title=translate: add opensearch-ttmserver-test|status=}} - {{phabricator|T425377}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-12 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-12 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-12 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-12 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-12 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1|1.47.0-wmf.1}} * group0 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-12 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|alexsanford|alexsanford}} {{deploy|type=config|gerrit=1285905|title=Enforce 2FA requirements for phase 2 groups|status=}} - {{phabricator|T423119}} {{deploy|type=config|gerrit=1286469|title=Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3|status=}} - {{phabricator|T423119}} {{phabricator|T423120}} {{ircnick|dbrant|Dmitry}} {{deploy|type=config|gerrit=1285930|title=docroot: Add "get_login_creds" permission to Android app.|status=}} - {{phabricator|T426010}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1285482|title=Allow svwiki bureaucrats to remove sysop rights|status=}} - {{phabricator|T425806}} {{ircnick|VadymTS1|VadymTS1}} {{deploy|type=config|gerrit=1283048|title=Enabling RSS extension for cowikimedia chapter|status=}} - {{phabricator|T425440}} {{deploy|type=config|gerrit=1286390|title=Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary|status=}} - {{phabricator|T425332}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=1.47.0-wmf.2|gerrit=1286484|title=Bump wikimedia/parsoid to 0.24.0-a3|status=}} - {{phabricator|T425981}} {{deploy|type=1.47.0-wmf.2|gerrit=1286485|title=Bump wikimedia/parsoid to 0.24.0-a3|status=}} - {{phabricator|T425981}} {{deploy|type=1.47.0-wmf.2|gerrit=1286488|title=Disable unit tests that fail with new vendor release|status=}} {{deploy|type=1.47.0-wmf.2|gerrit=1286489|title=Skip ContentHolderTest that fails with new vendor release|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-12 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-13}}=== {{Deployment calendar event card |when=2026-05-13 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|atsukoito|atsukoito}} {{deploy|type=config|gerrit=1286371|title=translate: add opensearch-ttmserver-test|status=}} - {{phabricator|T425377}} {{ircnick|WMDE-Fisch|WMDE-Fisch}} {{deploy|type=config|gerrit=1286400|title=testwiki: Disable sub-ref's synthetic list defined refs on test wikis|status=}} - {{phabricator|T425967}} {{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1286277|title=Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1}} * group1 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-13 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-13 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=1.47.0-wmf.1|gerrit=1286359|title=Add configurable user-agent and sparql endpoint url|status=}} - {{phabricator|T425389}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1284900|title=Completely disable MediaWiki page patrolling functions on German Wikipedia|status=}} - {{phabricator|T316393}} {{ircnick|mfossati|mfossati}} {{deploy|type=1.47.0-wmf.2|gerrit=1286518|title=[Share Highlight] Exclude section edit links, footnotes from selection|status=}} - {{phabricator|T423658}} {{deploy|type=1.47.0-wmf.2|gerrit=1286838|title=Add robust color fallbacks for QuoteCard average-color styling|status=}} - {{phabricator|T425358}} {{deploy|type=1.47.0-wmf.2|gerrit=1286839|title=Fixed card width|status=}} - {{phabricator|T425710}} {{deploy|type=1.47.0-wmf.2|gerrit=1286844|title=Adjust image size to match fixed width|status=}} - {{phabricator|T425710}} {{deploy|type=1.47.0-wmf.2|gerrit=1286846|title=ShareHighlight: exclude browsers that don't support CSS has|status=}} - {{phabricator|T424873}} {{deploy|type=1.47.0-wmf.2|gerrit=1286847|title=Also skip instrumentation for unsupported browsers|status=}} - {{phabricator|T424873}} {{ircnick|Dragoniez|Dragoniez}} {{deploy|type=1.47.0-wmf.2|gerrit=1286890|title=ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries|status=}} - {{phabricator|T426033}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.47.0-wmf.1|gerrit=1286897|title=ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries|status=}} - {{phabricator|T426033}} {{deploy|type=1.47.0-wmf.1|gerrit=1286891|title=Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders|status=}} - {{phabricator|T425972}} {{deploy|type=1.47.0-wmf.2|gerrit=1286892|title=Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders|status=}} - {{phabricator|T425972}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.47.0-wmf.2|gerrit=1286917|title=WikiEditor: Populate user_groups in EditAttemptStep events|status=}} - {{phabricator|T424010}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-13 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-13 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1}} * group1 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-13 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|bpirkle|bpirkle}} {{deploy|type=config|gerrit=1286981|title=Revert "Add wikibase.v1 module to the sandbox were it is present"|status=}} - {{phabricator|T422403}} {{ircnick|ebernhardson|Erik B}} {{deploy|type=config|gerrit=1286997|title=Revert "cirrus: AB test query suggester variants"|status=}} - {{phabricator|T407432}} {{ircnick|Jdlrobson|Jdlrobson}} {{deploy|type=config|gerrit=1287006|title=Update small size for Swedish Wikipedia|status=}} - {{phabricator|T424910}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-13 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-13 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-14}}=== {{Deployment calendar event card |when=2026-05-14 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2}} * group2 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-14 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-14 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-14 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|annet|annet}} {{deploy|type=config|gerrit=1285913|title=Add ReadingLists Account Creation CTA campaign|status=}} - {{phabricator|T422169}} {{deploy|type=1.47.0-wmf.2|gerrit=1286327|title=WelcomeSurvey: Respect returnTo for campaigns skipping the survey|status=}} - {{phabricator|T422169}} {{ircnick|Nvdtn19|Nvdtn19}} {{deploy|type=config|gerrit=1216721|title=viwikivoyage: enable relatedarticle and pop-up|status=}} - {{phabricator|T405724}} {{ircnick|Krinkle|Krinkle}} {{deploy|type=config|gerrit=1269442|title=Enable wgTrackMediaRequestProvenance on remaining Wikipedias|status=}} - {{phabricator|T414338}} {{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1287043|title=Enable the Article Guidance experiment on simplewiki|status=}} - {{phabricator|T426278}} {{ircnick|mfossati|mfossati}} {{deploy|type=1.47.0-wmf.2|gerrit=1287363|title=Scale share-highlight card to fit small viewports|status=}} - {{phabricator|T426247}} {{ircnick|phuedx|Sam Smith}} {{deploy|type=1.47.0-wmf.2|gerrit=1287368|title=ext.wikimediaEvents: Add synth-aa-ncs-1 experiment|status=}} - {{phabricator|T419514}} {{ircnick|robertsky|robertsky}} {{deploy|type=config|gerrit=1287367|title=throttle rule for ESEAP Conference 2026 15-18 May 2026|status=}} - {{phabricator|T426295}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-14 08:00 SF |length=1 |window=Train log triage |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-14 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|Dreamy_Jazz|WBrown (WMF)}} * {{gerrit|1279281}} purge_securepoll: don't exclude private wikis {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-14 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-14 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-14 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2}} * group2 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-14 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|JSherman|Jsn.sherman}} {{deploy|type=config|gerrit=1192921|title=Enable AutoModerator on Italian Wikipedia|status=}} - {{phabricator|T405152}} {{deploy|type=config|gerrit=1286974|title=Enable AutoModerator on Albanian Wikipedia|status=}} - {{phabricator|T420450}} {{deploy|type=config|gerrit=1286975|title=Enable AutoModerator on Dutch Wikipedia|status=}} - {{phabricator|T425509}} {{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1287427|title=Simplewiki: include article wizard in AG experiment|status=}} - {{phabricator|T426278}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1287433|title=Restrict the changetags user right to bots and sysops on mediawiki.org|status=}} - {{phabricator|T355445}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1287002|title=Disable wgNewUserMessageOnAutoCreate on all WMF wikis|status=}} - {{phabricator|T426206}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start {{ircnick|jan_drewniak|Jan Drewniak}} {{deploy|type=config|gerrit=1287485|title=Disable Reading Lists survey for Wikipedias|status=}} - {{phabricator|T421776}} }} {{Deployment calendar event card |when=2026-05-14 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-15}}=== {{Deployment calendar event card |when=2026-05-15 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-15 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-16}}=== {{Deployment calendar event card |when=2026-05-16 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of May 18== ==={{Deployment_day|date=2026-05-17}}=== {{Deployment calendar event card |when=2026-05-17 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-18}}=== {{Deployment calendar event card |when=2026-05-18 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Daimona|Daimona (work)}} {{deploy|type=1.47.0-wmf.2|gerrit=1287895|title=Store uncomputed references delta as null, not 0|status=}} - {{phabricator|T426002}} * Fixup production data for [[:phab:T426002|T426002]], running query in task description {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-18 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-18 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-18 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|alexsanford|Alex}}, {{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-18 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-18 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.3</code> }} {{Deployment calendar event card |when=2026-05-18 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.3</code> to testwikis }} {{Deployment calendar event card |when=2026-05-18 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-18 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-19}}=== {{Deployment calendar event card |when=2026-05-19 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2->1.47.0-wmf.3|1.47.0-wmf.2|1.47.0-wmf.2}} * group0 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-19 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-19 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-19 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-19 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-19 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-19 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-19 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-19 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-19 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-20}}=== {{Deployment calendar event card |when=2026-05-20 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.3|1.47.0-wmf.2->1.47.0-wmf.3|1.47.0-wmf.2}} * group1 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-20 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-20 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-20 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-20 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-20 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-20 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-21}}=== {{Deployment calendar event card |when=2026-05-21 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.3|1.47.0-wmf.3|1.47.0-wmf.2->1.47.0-wmf.3}} * group2 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-21 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-21 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-21 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-21 08:00 SF |length=1 |window=Train log triage |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-21 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-21 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-21 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-21 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-21 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-22}}=== {{Deployment calendar event card |when=2026-05-22 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-22 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-23}}=== {{Deployment calendar event card |when=2026-05-23 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} oi90dj216yq4bt3xag8z0pb86hh41jm 2414308 2414262 2026-05-16T02:00:29Z DeploymentCalendarTool 20896 Remove Week of May 11 2414308 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[:m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[:mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[:m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the '''backport scheduling tool''' to more easily edit this page: <div style="text-align: center; margin: 1em 0">{{Clickable button 2|:toollabs:schedule-deployment|Schedule a backport|class=mw-ui-progressive}}</div> * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[:gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. **To create an one-off window, simply edit this page accordingly ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if you anticipate or are uncertain about noticeable impacts to database load, HTTP caching, or the introduction of new cookies. ** '''Announce''' deployments of major features to the community via [[:m:Tech/News/Next|Tech News]] and/or via other [[:mw:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[:phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[:mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of May 18== ==={{Deployment_day|date=2026-05-17}}=== {{Deployment calendar event card |when=2026-05-17 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-18}}=== {{Deployment calendar event card |when=2026-05-18 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Daimona|Daimona (work)}} {{deploy|type=1.47.0-wmf.2|gerrit=1287895|title=Store uncomputed references delta as null, not 0|status=}} - {{phabricator|T426002}} * Fixup production data for [[:phab:T426002|T426002]], running query in task description {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-18 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-18 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-18 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-18 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|alexsanford|Alex}}, {{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-18 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-18 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.3</code> }} {{Deployment calendar event card |when=2026-05-18 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.3</code> to testwikis }} {{Deployment calendar event card |when=2026-05-18 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-18 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-18 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-19}}=== {{Deployment calendar event card |when=2026-05-19 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2->1.47.0-wmf.3|1.47.0-wmf.2|1.47.0-wmf.2}} * group0 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-19 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-19 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-19 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-19 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-19 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-19 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-19 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-19 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-19 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-19 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-20}}=== {{Deployment calendar event card |when=2026-05-20 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.3|1.47.0-wmf.2->1.47.0-wmf.3|1.47.0-wmf.2}} * group1 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-20 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-20 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-20 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-20 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-20 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-20 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-20 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-20 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-21}}=== {{Deployment calendar event card |when=2026-05-21 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.3|1.47.0-wmf.3|1.47.0-wmf.2->1.47.0-wmf.3}} * group2 to [[mw:MediaWiki_1.47/wmf.3|1.47.0-wmf.3]] * '''Blockers: {{phabricator|T423912}}''' }} {{Deployment calendar event card |when=2026-05-21 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-21 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-21 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-21 08:00 SF |length=1 |window=Train log triage |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-21 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-21 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-21 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-21 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-21 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-21 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-22}}=== {{Deployment calendar event card |when=2026-05-22 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-22 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-23}}=== {{Deployment calendar event card |when=2026-05-23 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of May 25== ==={{Deployment_day|date=2026-05-24}}=== {{Deployment calendar event card |when=2026-05-24 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-25}}=== {{Deployment calendar event card |when=2026-05-25 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-25 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-25 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-25 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-25 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-25 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-25 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-25 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-25 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|alexsanford|Alex}}, {{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-25 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-25 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.4</code> }} {{Deployment calendar event card |when=2026-05-25 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.4</code> to testwikis }} {{Deployment calendar event card |when=2026-05-25 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-25 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-25 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-26}}=== {{Deployment calendar event card |when=2026-05-26 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-26 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.3->1.47.0-wmf.4|1.47.0-wmf.3|1.47.0-wmf.3}} * group0 to [[mw:MediaWiki_1.47/wmf.4|1.47.0-wmf.4]] * '''Blockers: {{phabricator|T423913}}''' }} {{Deployment calendar event card |when=2026-05-26 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-26 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-26 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-26 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-26 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-26 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-26 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-26 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-26 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-26 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-26 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-27}}=== {{Deployment calendar event card |when=2026-05-27 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-27 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.4|1.47.0-wmf.3->1.47.0-wmf.4|1.47.0-wmf.3}} * group1 to [[mw:MediaWiki_1.47/wmf.4|1.47.0-wmf.4]] * '''Blockers: {{phabricator|T423913}}''' }} {{Deployment calendar event card |when=2026-05-27 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-27 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-27 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-27 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-27 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-27 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-27 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-27 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-27 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-27 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-27 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-28}}=== {{Deployment calendar event card |when=2026-05-28 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-28 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.4|1.47.0-wmf.4|1.47.0-wmf.3->1.47.0-wmf.4}} * group2 to [[mw:MediaWiki_1.47/wmf.4|1.47.0-wmf.4]] * '''Blockers: {{phabricator|T423913}}''' }} {{Deployment calendar event card |when=2026-05-28 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-28 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-28 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-28 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-28 08:00 SF |length=1 |window=Train log triage |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-28 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-28 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-28 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-28 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-28 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-28 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-29}}=== {{Deployment calendar event card |when=2026-05-29 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-29 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-30}}=== {{Deployment calendar event card |when=2026-05-30 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} og5mltks9hhzphpic35isxwnayasqrq Server Admin Log 0 7919 2414250 2414249 2026-05-15T11:59:15Z Stashbot 7414 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 2414250 wikitext text/x-wiki == 2026-05-15 == * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 85bs7t52tpio3n48zid5uuanf2yx726 2414251 2414250 2026-05-15T12:02:43Z Stashbot 7414 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet 2414251 wikitext text/x-wiki == 2026-05-15 == * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> apq67u9rivtbl6ek4not7bp0i9uqhue 2414252 2414251 2026-05-15T12:18:22Z Stashbot 7414 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" 2414252 wikitext text/x-wiki == 2026-05-15 == * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> q0s2r9bjk3xrmwxqc0vcehl4wd0gwe4 2414253 2414252 2026-05-15T12:18:51Z Stashbot 7414 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" 2414253 wikitext text/x-wiki == 2026-05-15 == * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 0uhr3k64yprv24vii6227gwfsgfcuw7 2414254 2414253 2026-05-15T12:18:53Z Stashbot 7414 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) 2414254 wikitext text/x-wiki == 2026-05-15 == * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> sdwlx9qtohajbe3cyarl2mogqe1gmq3 2414277 2414254 2026-05-15T16:00:31Z Stashbot 7414 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) 2414277 wikitext text/x-wiki == 2026-05-15 == * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> sx2wfri983hsaglejlvyrj4h453yhfs 2414278 2414277 2026-05-15T16:02:28Z Stashbot 7414 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts 2414278 wikitext text/x-wiki == 2026-05-15 == * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> movmj1n8rnb5r7ciajulmylxhepdpox 2414281 2414278 2026-05-15T16:53:10Z Stashbot 7414 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply 2414281 wikitext text/x-wiki == 2026-05-15 == * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 1q098hbr2w3pnpiq5y3t1708s5enm4s 2414282 2414281 2026-05-15T16:53:16Z Stashbot 7414 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply 2414282 wikitext text/x-wiki == 2026-05-15 == * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> n3gr6rvjggg2slbg0w89zqmv5xc1w9y 2414289 2414282 2026-05-15T19:18:21Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.dns.netbox 2414289 wikitext text/x-wiki == 2026-05-15 == * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 423vba0orlshgxv9qxq3bisp5l4w2kc 2414291 2414289 2026-05-15T19:21:10Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) 2414291 wikitext text/x-wiki == 2026-05-15 == * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 3xafd2v6edd2mwz99erhh0qfui7ob7o 2414292 2414291 2026-05-15T19:21:39Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 2414292 wikitext text/x-wiki == 2026-05-15 == * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> npef18qp0dr0uf5tgx47z7ze0i9v3ok 2414293 2414292 2026-05-15T19:22:39Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 2414293 wikitext text/x-wiki == 2026-05-15 == * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 6i29uxelafo5w5jqhcf7c20zlw7brdg 2414294 2414293 2026-05-15T19:23:29Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2414294 wikitext text/x-wiki == 2026-05-15 == * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 2fxtmofhal9604v1ovc09q9vdqm492d 2414295 2414294 2026-05-15T19:30:56Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2414295 wikitext text/x-wiki == 2026-05-15 == * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> bh6aykf3dpnzge2sp31w4qba58vv6dy 2414296 2414295 2026-05-15T19:32:22Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm 2414296 wikitext text/x-wiki == 2026-05-15 == * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 4vdsvf9u5o3zuewoab8hdjj8s4hx7ya 2414297 2414296 2026-05-15T19:47:52Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage 2414297 wikitext text/x-wiki == 2026-05-15 == * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> b7j69lg97y6d16g1yniq3dbn4d66u59 2414298 2414297 2026-05-15T19:53:14Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage 2414298 wikitext text/x-wiki == 2026-05-15 == * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> ekcxthjktpcton43gyfzciz89u9kyuy 2414299 2414298 2026-05-15T20:09:19Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" 2414299 wikitext text/x-wiki == 2026-05-15 == * 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> m8sw7pcuzgqnpcgny54zjs4m5d1vy5q 2414300 2414299 2026-05-15T20:12:59Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" 2414300 wikitext text/x-wiki == 2026-05-15 == * 20:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 5x7pmyhiz3qt4stuix77twvvgezmoxu 2414301 2414300 2026-05-15T20:13:01Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1290.eqiad.wmnet with OS bookworm 2414301 wikitext text/x-wiki == 2026-05-15 == * 20:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1290.eqiad.wmnet with OS bookworm * 20:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> fksi4c3p1uqaij5ynjk54cd9o9qdzc2 2414302 2414301 2026-05-15T20:55:50Z Stashbot 7414 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1287940|Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] 2414302 wikitext text/x-wiki == 2026-05-15 == * 20:55 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] * 20:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1290.eqiad.wmnet with OS bookworm * 20:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> th446dsjxdqwif6kc6i65fax6aaiq1p 2414303 2414302 2026-05-15T20:57:47Z Stashbot 7414 jforrester@deploy1003: jforrester, seddon: Backport for [[gerrit:1287940|Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. 2414303 wikitext text/x-wiki == 2026-05-15 == * 20:57 jforrester@deploy1003: jforrester, seddon: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:55 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] * 20:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1290.eqiad.wmnet with OS bookworm * 20:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> smlrpsthazlmj4cmv3v9nnc2i91m5e5 2414304 2414303 2026-05-15T20:59:13Z Stashbot 7414 jforrester@deploy1003: jforrester, seddon: Continuing with deployment 2414304 wikitext text/x-wiki == 2026-05-15 == * 20:59 jforrester@deploy1003: jforrester, seddon: Continuing with deployment * 20:57 jforrester@deploy1003: jforrester, seddon: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:55 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] * 20:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1290.eqiad.wmnet with OS bookworm * 20:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 27jiuadglfm8aof3rarkixjqku7fo93 2414305 2414304 2026-05-15T21:03:32Z Stashbot 7414 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287940|Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] (duration: 07m 43s) 2414305 wikitext text/x-wiki == 2026-05-15 == * 21:03 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] (duration: 07m 43s) * 20:59 jforrester@deploy1003: jforrester, seddon: Continuing with deployment * 20:57 jforrester@deploy1003: jforrester, seddon: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:55 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1287940{{!}}Revert "Enable wgTrackMediaRequestProvenance on remaining Wikipedias" (T425580)]] * 20:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1290.eqiad.wmnet with OS bookworm * 20:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:09 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:47 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1290.eqiad.wmnet with reason: host reimage * 19:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 19:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 19:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 19:21 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 16:02 dancy@deploy1003: Installation of scap version "4.265.1" completed for 2 hosts * 16:00 dancy@deploy1003: Installing scap version "4.265.1" for 2 host(s) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove IPs that had been used for ulsfo cr links from dns - cmooney@cumin1003" * 12:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet * 11:59 Emperor: depool / restart swift / repool on ms-fe2010 ms-fe2012 * 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:34 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2065.codfw.wmnet with OS bullseye * 11:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:10 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 11:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: host reimage * 10:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2064.codfw.wmnet with OS bullseye * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:46 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie * 10:43 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2065 * 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2065 * 10:40 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2065 * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2065.codfw.wmnet 167.48.192.10.in-addr.arpa 7.6.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:40 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2065 - mvernon@cumin2002" * 10:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 10:36 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2065 * 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2065.codfw.wmnet with OS bullseye * 10:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:31 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:28 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:23 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:22 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 10:20 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: host reimage * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify entries for ulsfo router interfaces - cmooney@cumin1003" * 10:10 topranks: Migrate ulsfo cr<->cr traffic to use path via switches not direct link [[phab:T424611|T424611]] * 10:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 10:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 10:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 10:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 10:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:56 topranks: Migrate cr3-ulsfo link to asw1-22-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:32 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2064.codfw.wmnet with OS bullseye * 09:32 topranks: Migrate cr4-ulsfo link to asw1-23-ulsfo to tagged interface [[phab:T424611|T424611]] * 09:30 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 09:30 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2065 * 09:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 09:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on db2218.codfw.wmnet with reason: Host crashed [[phab:T426383|T426383]] * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2064 * 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2064 * 09:06 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2064 * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2064.codfw.wmnet 56.32.192.10.in-addr.arpa 6.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2064 - mvernon@cumin2002" * 09:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 09:02 mvernon@cumin2002: START - Cookbook sre.dns.netbox * 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2064 * 09:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2064.codfw.wmnet with OS bullseye * 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92553 and previous config saved to /var/cache/conftool/dbconfig/20260515-090000-marostegui.json * 08:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92552 and previous config saved to /var/cache/conftool/dbconfig/20260515-085836-marostegui.json * 08:56 marostegui: Starting s7 codfw failover from db2218 to db2220 - [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426380|T426380]] * 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 [[phab:T426380|T426380]]', diff saved to https://phabricator.wikimedia.org/P92551 and previous config saved to /var/cache/conftool/dbconfig/20260515-085420-marostegui.json * 08:41 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2065 * 08:41 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2064 * 08:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:17 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 08:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 08:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:55 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2064 * 07:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 07:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2010 * 07:39 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2010 * 07:10 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:31 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s) * 02:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1290.eqiad.wmnet with OS bookworm * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1289.eqiad.wmnet with OS bookworm * 01:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1289.eqiad.wmnet with reason: host reimage * 00:43 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:42 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:14 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1290.eqiad.wmnet with OS bookworm * 00:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:01 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-14 == * 23:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:57 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 23:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 23:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:49 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:39 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 23:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:47 egardner@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] (duration: 07m 14s) * 21:43 egardner@deploy1003: egardner: Continuing with deployment * 21:41 egardner@deploy1003: egardner: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:40 egardner@deploy1003: Started scap sync-world: Backport for [[gerrit:1287488{{!}}Share Highlight: overdraw photo on share card canvas (T426344)]] * 21:33 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] (duration: 09m 15s) * 21:29 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 21:26 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:24 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1287485{{!}}Disable Reading Lists survey for Wikipedias (T421776)]] * 21:16 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] (duration: 06m 33s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1286.eqiad.wmnet with OS bookworm * 21:15 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:12 dreamyjazz@deploy1003: dreamyjazz, seddon: Continuing with deployment * 21:11 dreamyjazz@deploy1003: dreamyjazz, seddon: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:10 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1287479{{!}}Enable hCaptcha for account creation API on group 0 wiki's]], [[gerrit:1287484{{!}}Remove DynamicPageList from legalteamwiki as unused]] * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1287.eqiad.wmnet with OS bookworm * 20:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:55 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] (duration: 07m 03s) * 20:46 sbisson@deploy1003: sbisson: Continuing with deployment * 20:45 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287427{{!}}Simplewiki: include article wizard in AG experiment (T426278)]] * 20:43 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1287.eqiad.wmnet with reason: host reimage * 20:35 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] (duration: 10m 18s) * 20:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 cjming@deploy1003: cjming, neriah: Continuing with deployment * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1289.eqiad.wmnet with OS bookworm * 20:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1289.eqiad.wmnet with OS bookworm * 20:27 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1287002{{!}}Disable wgNewUserMessageOnAutoCreate on all WMF wikis (T426206)]] * 20:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:19 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1287.eqiad.wmnet with OS bookworm * 20:19 jsn@deploy1003: Finished scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] (duration: 07m 48s) * 20:18 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1286.eqiad.wmnet with reason: host reimage * 20:14 jsn@deploy1003: kgraessle, jsn: Continuing with deployment * 20:13 jsn@deploy1003: kgraessle, jsn: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:11 jsn@deploy1003: Started scap sync-world: Backport for [[gerrit:1192921{{!}}Enable AutoModerator on Italian Wikipedia (T405152)]], [[gerrit:1286974{{!}}Enable AutoModerator on Albanian Wikipedia (T420450)]], [[gerrit:1286975{{!}}Enable AutoModerator on Dutch Wikipedia (T425509)]] * 20:03 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 20:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1286.eqiad.wmnet with OS bookworm * 19:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1281.eqiad.wmnet with OS bookworm * 19:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1286.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1286 * 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1286 * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1286] - vriley@cumin1003" * 19:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1281.eqiad.wmnet with reason: host reimage * 19:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1274.eqiad.wmnet with OS bookworm * 19:14 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1281.eqiad.wmnet with OS bookworm * 18:58 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 18:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:25 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1274.eqiad.wmnet with reason: host reimage * 18:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:16 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:14 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 17:32 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 17:31 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 17:23 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:17 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply * 17:17 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply * 17:16 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply * 17:15 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply * 17:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 17:10 cmooney@dns2005: END - running authdns-update * 17:09 cmooney@dns2005: START - running authdns-update * 17:06 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:58 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - [[phab:T426298|T426298]] * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:49 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:36 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 16:35 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 16:31 topranks: disable core router direct link at esams now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:25 topranks: disable core router direct link at drmrs now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:21 topranks: disable core router direct link at magru now that traffic is flowing via switches [[phab:T424611|T424611]] * 16:20 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 16:20 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 16:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:15 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:14 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1288.eqiad.wmnet with OS bookworm * 16:13 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove records for deleted IPs esams,drmrs and magru - cmooney@cumin1003" * 16:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 16:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply * 15:59 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply * 15:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1290.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:56 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1290 * 15:55 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1290 * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:55 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1290] - vriley@cumin1003" * 15:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:51 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:50 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1288.eqiad.wmnet with reason: host reimage * 15:49 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.2 - cmooney@cumin1003 * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1285.eqiad.wmnet with OS bookworm * 15:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:46 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 15:45 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1289.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1289 * 15:41 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:41 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1289] - vriley@cumin1003" * 15:35 vriley@cumin1003: START - Cookbook sre.dns.netbox * 15:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1288.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1284.eqiad.wmnet with OS bookworm * 15:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 15:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 15:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1285.eqiad.wmnet with reason: host reimage * 15:16 bearloga@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] (duration: 06m 20s) * 15:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:12 bearloga@deploy1003: bearloga: Continuing with deployment * 15:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:12 bearloga@deploy1003: bearloga: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:10 bearloga@deploy1003: Started scap sync-world: Backport for [[gerrit:1287422{{!}}EventStreamConfig: fix product_metrics.web_base (T426209)]] * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1284.eqiad.wmnet with reason: host reimage * 15:08 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1285.eqiad.wmnet with OS bookworm * 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1288.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92544 and previous config saved to /var/cache/conftool/dbconfig/20260514-145715-fceratto.json * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1288 * 14:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1283.eqiad.wmnet with OS bookworm * 14:54 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage * 14:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1288 * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1288] - vriley@cumin1003" * 14:52 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1284.eqiad.wmnet with OS bookworm * 14:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92542 and previous config saved to /var/cache/conftool/dbconfig/20260514-144707-fceratto.json * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1287.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:44 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1285.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1287] - vriley@cumin1003" * 14:37 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1289 * 14:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1289 * 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92541 and previous config saved to /var/cache/conftool/dbconfig/20260514-143659-fceratto.json * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1282.eqiad.wmnet with OS bookworm * 14:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:34 phuedx@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] (duration: 11m 14s) * 14:33 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1283.eqiad.wmnet with reason: host reimage * 14:33 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1285 * 14:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1285 * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:31 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1285] - vriley@cumin1003" * 14:29 phuedx@deploy1003: phuedx: Continuing with deployment * 14:27 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92540 and previous config saved to /var/cache/conftool/dbconfig/20260514-142650-fceratto.json * 14:26 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 14:24 phuedx@deploy1003: phuedx: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1280.eqiad.wmnet with OS bookworm * 14:23 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:22 phuedx@deploy1003: Started scap sync-world: Backport for [[gerrit:1287368{{!}}ext.wikimediaEvents: Add synth-aa-ncs-1 experiment (T419514)]] * 14:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1284.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1284 * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92539 and previous config saved to /var/cache/conftool/dbconfig/20260514-141922-fceratto.json * 14:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance * 14:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1283.eqiad.wmnet with OS bookworm * 14:18 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2150 from dbctl [[phab:T424342|T424342]]', diff saved to https://phabricator.wikimedia.org/P92538 and previous config saved to /var/cache/conftool/dbconfig/20260514-141812-cwilliams.json * 14:17 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1284 * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:17 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1284] - vriley@cumin1003" * 14:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92537 and previous config saved to /var/cache/conftool/dbconfig/20260514-141644-fceratto.json * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1282.eqiad.wmnet with reason: host reimage * 14:14 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] (duration: 08m 00s) * 14:13 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:09 krinkle@deploy1003: krinkle, robertsky: Continuing with deployment * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:08 krinkle@deploy1003: krinkle, robertsky: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1279.eqiad.wmnet with OS bookworm * 14:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P92536 and previous config saved to /var/cache/conftool/dbconfig/20260514-140635-fceratto.json * 14:06 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1287367{{!}}throttle rule for ESEAP Conference 2026 15-18 May 2026 (T426295)]] * 14:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 14:01 cwilliams@cumin1003: dbctl commit (dc=all): 'Remove db2151 from dbctl [[phab:T424343|T424343]]', diff saved to https://phabricator.wikimedia.org/P92535 and previous config saved to /var/cache/conftool/dbconfig/20260514-140110-cwilliams.json * 14:00 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] (duration: 07m 09s) * 13:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1282.eqiad.wmnet with OS bookworm * 13:58 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1280.eqiad.wmnet with reason: host reimage * 13:57 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:56 mfossati@deploy1003: mfossati: Continuing with deployment * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92534 and previous config saved to /var/cache/conftool/dbconfig/20260514-135626-fceratto.json * 13:56 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 13:56 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie * 13:56 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:55 mfossati@deploy1003: mfossati: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:54 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 13:53 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1287363{{!}}Scale share-highlight card to fit small viewports (T426247)]] * 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2152.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:53 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2165 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92533 and previous config saved to /var/cache/conftool/dbconfig/20260514-135315-fceratto.json * 13:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance * 13:53 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:52 cwilliams@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2150.codfw.wmnet with reason: Depooled host, will be decommissioned * 13:49 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] (duration: 07m 03s) * 13:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:45 krinkle@deploy1003: krinkle: Continuing with deployment * 13:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1279.eqiad.wmnet with reason: host reimage * 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:44 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1280.eqiad.wmnet with OS bookworm * 13:42 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269442{{!}}Enable wgTrackMediaRequestProvenance on remaining Wikipedias (T414338)]] * 13:42 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] (duration: 12m 33s) * 13:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 13:38 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:37 krinkle@deploy1003: krinkle, annet: Continuing with deployment * 13:33 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2151: Host will be decommissioned * 13:33 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2151: Host will be decommissioned * 13:32 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Host will be decommissioned * 13:31 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Host will be decommissioned * 13:31 krinkle@deploy1003: krinkle, annet: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:30 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1279.eqiad.wmnet with OS bookworm * 13:29 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1285913{{!}}Add ReadingLists Account Creation CTA campaign (T422169)]], [[gerrit:1286327{{!}}WelcomeSurvey: Respect returnTo for campaigns skipping the survey (T422169)]] * 13:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:20 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1283.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1283 * 13:19 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:18 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1283 * 13:16 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] (duration: 08m 10s) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1283] - vriley@cumin1003" * 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 sbisson@deploy1003: sbisson: Continuing with deployment * 13:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1282.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:10 sbisson@deploy1003: sbisson: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:10 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2165: Repooling after switchover * 13:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1282 * 13:08 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287043{{!}}Enable the Article Guidance experiment on simplewiki (T426278)]] * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db2165: Repooling after switchover * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Set correct weight [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92529 and previous config saved to /var/cache/conftool/dbconfig/20260514-130743-fceratto.json * 13:07 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1282 * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1282] - vriley@cumin1003" * 13:05 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1281.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:02 fceratto@cumin1003: dbctl commit (dc=all): 'Promote db2161 to s8 primary [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92528 and previous config saved to /var/cache/conftool/dbconfig/20260514-130213-fceratto.json * 13:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1281 * 13:00 federico3: Starting s8 codfw failover from db2165 to db2161 - [[phab:T426291|T426291]] * 13:00 kart_: Updated cxserver to 2026-05-14-123010-production ([[phab:T426174|T426174]], [[phab:T404298|T404298]]) * 12:59 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1281 * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:59 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:59 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1281] - vriley@cumin1003" * 12:58 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:57 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1280.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:55 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:55 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1280 * 12:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1280 * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1280] - vriley@cumin1003" * 12:50 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1279.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 12:50 fceratto@cumin1003: dbctl commit (dc=all): 'Set db2161 with weight 0 [[phab:T426291|T426291]]', diff saved to https://phabricator.wikimedia.org/P92527 and previous config saved to /var/cache/conftool/dbconfig/20260514-125014-fceratto.json * 12:49 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1279 * 12:49 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 [[phab:T426291|T426291]] * 12:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1279 * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1279] - vriley@cumin1003" * 12:47 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:46 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:42 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:40 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: update bgp groups for dse-k8s-wdqs - cmooney@cumin1003 * 12:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 28458 * 12:27 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 28458 * 12:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repool pc3 with pc2023 as codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92526 and previous config saved to /var/cache/conftool/dbconfig/20260514-122707-marostegui.json * 12:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 12:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 12:20 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 codfw master [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92525 and previous config saved to /var/cache/conftool/dbconfig/20260514-121958-marostegui.json * 12:18 marostegui@cumin1003: dbctl commit (dc=all): 'Add pc2023 to pc3 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92524 and previous config saved to /var/cache/conftool/dbconfig/20260514-121839-marostegui.json * 11:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 11:31 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 11:08 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 11:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 11:01 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: sync * 11:00 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply * 11:00 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1063.eqiad.wmnet with OS bullseye * 10:49 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1069.eqiad.wmnet with OS bullseye * 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2152 from dbctl [[phab:T424344|T424344]]', diff saved to https://phabricator.wikimedia.org/P92523 and previous config saved to /var/cache/conftool/dbconfig/20260514-104521-marostegui.json * 10:41 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'. * 10:40 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'. * 10:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply * 10:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1063.eqiad.wmnet with reason: host reimage * 10:27 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1069.eqiad.wmnet with reason: host reimage * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:25 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:19 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1063.eqiad.wmnet with OS bullseye * 10:14 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1069.eqiad.wmnet with OS bullseye * 10:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 10:02 cwilliams@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2152: Host will be decommissioned * 10:02 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152: Host will be decommissioned * 09:54 cwilliams@cumin1003: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) depool db2152.codfw.wmnet: Host will be decommissioned * 09:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply * 09:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply * 09:49 cwilliams@cumin1003: START - Cookbook sre.mysql.depool depool db2152.codfw.wmnet: Host will be decommissioned * 09:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1067.eqiad.wmnet with OS bullseye * 09:33 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1065.eqiad.wmnet with OS bullseye * 09:30 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1068.eqiad.wmnet with OS bullseye * 09:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1066.eqiad.wmnet with OS bullseye * 09:23 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:20 Emperor: rebalance codfw swift rings [[phab:T354872|T354872]] * 09:18 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:10 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 09:06 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1065.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1068.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1067.eqiad.wmnet with reason: host reimage * 09:06 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1066.eqiad.wmnet with reason: host reimage * 08:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 08:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1068.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1067.eqiad.wmnet with OS bullseye * 08:54 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1066.eqiad.wmnet with OS bullseye * 08:54 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1065.eqiad.wmnet with OS bullseye * 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2149 [[phab:T424341|T424341]]', diff saved to https://phabricator.wikimedia.org/P92520 and previous config saved to /var/cache/conftool/dbconfig/20260514-083916-marostegui.json * 08:08 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 07:01 kart_: Update cxserver to 2026-04-23-114216-production ([[phab:T423002|T423002]]) * 07:00 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 07:00 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc[2013,2023].codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance on pc3 * 06:40 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 06:40 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 06:39 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc2013: Replacing HW [[phab:T418973|T418973]] * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1158: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1158: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS trixie * 05:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:25 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage * 05:12 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS trixie * 05:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1158: Reimage to Trixie * 05:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Reimage to Trixie * 05:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s7 master: reimage to Debian Trixie * 05:04 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 49s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:07 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 == 2026-05-13 == * 21:12 Amir1: remapping thumbsize of 0 to 2 in all group0 wikis ([[phab:T376152|T376152]]) * 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:55 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] (duration: 07m 48s) * 20:51 jdlrobson@deploy1003: ladsgroup, jdlrobson: Continuing with deployment * 20:49 jdlrobson@deploy1003: ladsgroup, jdlrobson: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:47 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287022{{!}}wgThumbLimits: Remove the exception for itwikiquote (T376152)]] * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:43 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] (duration: 07m 32s) * 20:42 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 20:41 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 20:38 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:37 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:35 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287000{{!}}Handle share-highlight images w/o resizeUrl (T426215)]] * 20:33 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] (duration: 07m 26s) * 20:28 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 20:27 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1287006{{!}}Update small size for Swedish Wikipedia (T424910)]] * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:23 ebernhardson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] (duration: 07m 06s) * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply * 20:19 ebernhardson@deploy1003: ebernhardson: Continuing with deployment * 20:18 ebernhardson@deploy1003: ebernhardson: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 20:17 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 20:16 ebernhardson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286997{{!}}Revert "cirrus: AB test query suggester variants" (T407432)]] * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] (duration: 06m 47s) * 20:13 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 20:09 cjming@deploy1003: bpirkle, cjming: Continuing with deployment * 20:09 cjming@deploy1003: bpirkle, cjming: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1286981{{!}}Revert "Add wikibase.v1 module to the sandbox were it is present" (T422403)]] * 19:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply * 19:23 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply * 19:09 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply * 18:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply * 18:27 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:26 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply * 18:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply * 18:20 cmooney@dns2005: END - running authdns-update * 18:19 cmooney@dns2005: START - running authdns-update * 18:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply * 18:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:13 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:13 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for ulsfo and eqsin IPs - cmooney@cumin1003" * 18:09 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 18:01 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply * 18:00 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply * 17:50 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply * 17:50 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply * 17:47 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply * 17:47 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply * 17:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 17:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply * 17:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply * 17:36 topranks: update OSPF config on magru core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:34 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:33 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:28 mutante: zuul1001 systemctl start zuul-scheduler ; /usr/bin/docker exec zuul-scheduler zuul-scheduler smart-reconfigure * 17:26 mutante: zuul1001 - stopping zuul-web; then manually running: /usr/sbin/usermod -u 923 zuul * 17:26 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:24 topranks: update OSPF config on esams core routers to shift traffic to switch links [[phab:T424611|T424611]] * 17:20 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply * 17:19 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply * 17:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 17:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet * 16:55 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet * 16:43 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 16:29 topranks: update OSPF config on drmrs core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:20 topranks: update OSPF config on eqsin core routers to shift traffic to switch links [[phab:T424611|T424611]] * 16:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 16:10 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:53 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:45 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:44 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:42 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:37 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.* * 15:36 fabfur: repooling cp7009 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 15:27 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.* * 15:27 fabfur: depooling cp7009 to install haproxy-awslc ([[phab:T419825|T419825]]) * 15:18 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:16 cmooney@dns2005: END - running authdns-update * 15:15 cmooney@dns2005: START - running authdns-update * 15:11 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:04 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:04 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:04 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 15:01 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:00 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:54 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:53 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:53 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for missing ulsfo subnets - cmooney@cumin1003" * 14:51 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:50 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 14:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS trixie * 14:42 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] (duration: 07m 17s) * 14:37 kharlan@deploy1003: kharlan: Continuing with deployment * 14:36 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:34 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286917{{!}}WikiEditor: Populate user_groups in EditAttemptStep events (T424010)]] * 14:33 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:33 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add missing DNS name for uslfo network new swtiches - pt1979@cumin2002" * 14:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:28 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:19 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] (duration: 06m 35s) * 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage * 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:15 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003 * 14:15 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:15 jforrester@deploy1003: jforrester: Continuing with deployment * 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: jforrester: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:12 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1286924{{!}}Disable wgWikiLambdaEnableAbstractClientMode everywhere (T422647)]] * 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * 14:08 Lucas_WMDE: UTC afternoon backport+config window done * 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * {{safesubst:SAL entry|1=14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAl}} * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:03 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Continuing with deployment * 14:03 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.* * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org * 14:02 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply * 14:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply * {{safesubst:SAL entry|1=14:01 lucaswerkmeister-wmde@deploy1003: dragoniez, matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-AP}} * 14:01 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply * 14:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply * 14:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS trixie * 13:59 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply * 13:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-canary: Restart for upgrade to JVM 11.0.31 - eevans@cumin1003 * {{safesubst:SAL entry|1=13:59 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286890{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286897{{!}}ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries (T426033)]], [[gerrit:1286891{{!}}Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders (T425972)]], [[gerrit:1286892{{!}}Add 'Promise-Non-Write-API-Action' to $wgAll}} * 13:58 fabfur: repooling cp7001 to test haproxy-awslc behavior ([[phab:T419825|T419825]]) * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org * 13:50 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] (duration: 07m 36s) * 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:45 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Continuing with deployment * 13:44 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, codenamenoreste: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1284900{{!}}Completely disable MediaWiki page patrolling functions on German Wikipedia (T316393)]] * {{safesubst:SAL entry|1=13:40 mfossati@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers t}} * 13:36 mfossati@deploy1003: jdlrobson, mfossati: Continuing with deployment * {{safesubst:SAL entry|1=13:29 mfossati@deploy1003: jdlrobson, mfossati: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers that d}} * 13:28 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java security update - jmm@cumin2002 * 13:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * {{safesubst:SAL entry|1=13:27 mfossati@deploy1003: Started scap sync-world: Backport for [[gerrit:1286518{{!}}[Share Highlight] Exclude section edit links, footnotes from selection (T423658)]], [[gerrit:1286838{{!}}Add robust color fallbacks for QuoteCard average-color styling (T425358)]], [[gerrit:1286839{{!}}Fixed card width (T425710)]], [[gerrit:1286844{{!}}Adjust image size to match fixed width (T425710)]], [[gerrit:1286846{{!}}ShareHighlight: exclude browsers th}} * 13:25 moritzm: installing openjdk-11 security updates * 13:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 13:12 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] (duration: 08m 18s) * 13:07 sbisson@deploy1003: sbisson: Continuing with deployment * 13:05 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:04 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw * 13:03 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286359{{!}}Add configurable user-agent and sparql endpoint url (T425389)]] * 12:50 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] (duration: 06m 42s) * 12:46 mszwarc@deploy1003: mszwarc: Continuing with deployment * 12:45 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:43 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1286884{{!}}Fix TypeError on saving userrights interwiki (T426185)]] * 12:41 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.* * 12:40 fabfur: depool cp7001 to test haproxy-awslc (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1286526) ([[phab:T419825|T419825]]) * 12:38 topranks: add ibgp peering between cr1-magru and cr2-magru over loopback IPs [[phab:T424611|T424611]] * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.major-upgrade (exit_code=0) * 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1236: Migration of db1236.eqiad.wmnet completed * 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet * 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet * 12:02 topranks: add ibgp peering between cr1-esams and cr2-esams over loopback IPs [[phab:T424611|T424611]] * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:57 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:57 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update records for drmrs ibgp link - cmooney@cumin1003" * 11:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2220: after reimage to trixie * 11:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 11:51 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1236: Migration of db1236.eqiad.wmnet completed * 11:44 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 11:43 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 11:43 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS trixie * 11:40 topranks: delete old direct ibgp peering between cr1-drms and cr2-drmrs [[phab:T424611|T424611]] * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 11:33 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 11:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 11:27 topranks: add ibgp peering between cr1-drms and cr2-drmrs over loopback IPs [[phab:T424611|T424611]] * 11:25 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:24 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 11:24 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 11:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage * 11:19 moritzm: installing Linux 6.1.170-3 on all Bookworm hosts * 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki2002.codfw.wmnet with OS trixie * 11:10 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2220: after reimage to trixie * 11:06 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS trixie * 11:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1236: Upgrading db1236.eqiad.wmnet * 11:03 fceratto@cumin1003: START - Cookbook sre.mysql.major-upgrade * 10:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS trixie * 10:55 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:55 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org * 10:52 moritzm: installing Linux 5.10.251-4 on all Bullseye hosts * 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org * 10:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki2002.codfw.wmnet with reason: host reimage * 10:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 10:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 10:35 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:33 topranks: switch eqsin core router ibgp path to route via switches [[phab:T424611|T424611]] * 10:26 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage * 10:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki2002.codfw.wmnet with OS trixie * 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 10:21 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki2002.codfw.wmnet * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:16 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:15 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:15 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:14 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:10 moritzm: installing Apache security updates on Bullseye * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:09 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 10:06 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS trixie * 10:05 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:05 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1064.eqiad.wmnet with OS bullseye * 10:04 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2220: Reimage to Trixie * 10:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Reimage to Trixie * 10:02 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 10:01 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2220 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92500 and previous config saved to /var/cache/conftool/dbconfig/20260513-095934-marostegui.json * 09:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2218 to s7 primary [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92499 and previous config saved to /var/cache/conftool/dbconfig/20260513-095814-marostegui.json * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:58 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 09:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1062.eqiad.wmnet with OS bullseye * 09:56 moritzm: installing distro-info-data updates from Bookworm point release * 09:54 marostegui: Starting s7 codfw failover from db2220 to db2218 - [[phab:T426142|T426142]] * 09:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 [[phab:T426142|T426142]] * 09:53 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1061.eqiad.wmnet with OS bullseye * 09:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2218 with weight 0 [[phab:T426142|T426142]]', diff saved to https://phabricator.wikimedia.org/P92498 and previous config saved to /var/cache/conftool/dbconfig/20260513-095337-marostegui.json * 09:51 moritzm: installing ca-certificates update from Bookworm point release * 09:50 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1060.eqiad.wmnet with OS bullseye * 09:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:45 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] (duration: 09m 01s) * 09:42 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:41 kharlan@deploy1003: kharlan: Continuing with deployment * 09:38 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:38 kharlan@deploy1003: kharlan: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:36 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1284633{{!}}EventStreamConfig: Register special_user_login event stream (T425631)]] * 09:34 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1064.eqiad.wmnet with reason: host reimage * 09:30 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1062.eqiad.wmnet with reason: host reimage * 09:29 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1061.eqiad.wmnet with reason: host reimage * 09:29 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1060.eqiad.wmnet with reason: host reimage * 09:28 cmooney@dns2005: END - running authdns-update * 09:27 cmooney@dns2005: START - running authdns-update * 09:27 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:25 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki2002.codfw.wmnet * 09:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:22 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2002.codfw.wmnet with reason: reimage * 09:21 logmsgbot: dreamyjazz Deployed security patch for [[phab:T423840|T423840]] * 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1064.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1062.eqiad.wmnet with OS bullseye * 09:17 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1061.eqiad.wmnet with OS bullseye * 09:17 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1060.eqiad.wmnet with OS bullseye * 09:14 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:10 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add records for 2620:0:863:fe09::/64 - cmooney@cumin1003" * 09:07 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:45 moritzm: installing dnsmasq security updates * 08:40 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:38 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:38 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 08:38 cmooney@dns2005: END - running authdns-update * 08:37 cmooney@dns2005: START - running authdns-update * 08:36 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 08:35 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:32 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add include for 2620:0:863:fe0a::/64 - cmooney@cumin1003" * 08:32 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 08:28 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 08:25 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 08:25 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 08:24 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] (duration: 09m 18s) * 08:20 kharlan@deploy1003: kharlan: Continuing with deployment * 08:16 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:14 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286805{{!}}WikimediaEvents: Enable Special:UserLogin instrumentation (T425631)]] * 08:11 moritzm: imported dnsmasq 2.92-1~wmf13u2 to trixie-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 08:08 topranks: reconfigure link from cr4-ulsfo to asw1-22-ulsfo as 802.1q tagged [[phab:T424611|T424611]] * 07:56 moritzm: imported dnsmasq 2.92-1~wmf12u2 to bookworm-wikimedia/main (backport of latest dnsmasq security fixes to our internal build) * 07:47 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] (duration: 09m 09s) * 07:43 dcausse@deploy1003: atsuko, dcausse: Continuing with deployment * 07:40 dcausse@deploy1003: atsuko, dcausse: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:39 gkyziridis@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 07:39 gkyziridis@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 07:38 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286371{{!}}translate: add opensearch-ttmserver-test (T425377)]] * 07:37 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 07:37 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 07:34 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 09m 32s) * 07:30 dcausse@deploy1003: dcausse, wmde-fisch: Continuing with deployment * 07:27 dcausse@deploy1003: dcausse, wmde-fisch: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:25 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286400{{!}}testwiki: Disable sub-ref's synthetic list defined refs on test wikis (T425967)]], [[gerrit:1286277{{!}}Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:18 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 07:18 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 07:17 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * 07:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: after reimage to trixie * 07:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: after reimage to trixie * 06:39 moritzm: installing Exim security updates on the hosts where Exim is used as a local mail relay * 06:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: after reimage to trixie * 06:27 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS trixie * 06:26 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1253: after reimage to trixie * 06:22 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1253.eqiad.wmnet with OS trixie * 06:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage * 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1253.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1253.eqiad.wmnet with OS trixie * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS trixie * 05:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1253: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Reimage to Trixie * 05:35 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2218: Reimage to Trixie * 05:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Reimage to Trixie * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1278.eqiad.wmnet with OS bookworm * 04:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 04:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:57 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1278.eqiad.wmnet with reason: host reimage * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1277.eqiad.wmnet with OS bookworm * 03:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:42 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1278.eqiad.wmnet with OS bookworm * 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1276.eqiad.wmnet with OS bookworm * 03:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:17 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1277.eqiad.wmnet with reason: host reimage * 03:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1278.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1278 * 03:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1278 * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 03:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:07 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1278] - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1276.eqiad.wmnet with reason: host reimage * 03:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 03:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1277.eqiad.wmnet with OS bookworm * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:49 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1276.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1275.eqiad.wmnet with OS bookworm * 02:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1277.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1277 * 02:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1277 * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:25 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1277] - vriley@cumin1003" * 02:21 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:19 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1274.eqiad.wmnet with OS bookworm * 02:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:16 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1276.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:15 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1276 * 02:13 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1275.eqiad.wmnet with reason: host reimage * 02:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1276 * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1276] - vriley@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 44s) * 02:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1275.eqiad.wmnet with OS bookworm * 01:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] (duration: 06m 35s) * 01:28 zabe@deploy1003: zabe: Continuing with deployment * 01:27 zabe@deploy1003: zabe: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1274.eqiad.wmnet with OS bookworm * 01:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1286532{{!}}Start reading from new tables everywhere except commons (2nd try) (T416548)]] * 01:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1275.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:14 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 01:12 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1275 * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1275] - vriley@cumin1003" * 01:08 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1274.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:58 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1274 * 00:57 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1274 * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:56 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:56 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1274] - vriley@cumin1003" * 00:52 vriley@cumin1003: START - Cookbook sre.dns.netbox * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1273.eqiad.wmnet with OS bookworm * 00:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" == 2026-05-12 == * 23:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:48 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1273.eqiad.wmnet with reason: host reimage * 23:46 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] (duration: 12m 45s) * 23:40 cscott@deploy1003: cscott: Continuing with deployment * 23:39 cscott@deploy1003: cscott: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:33 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286506{{!}}Re-enable unit tests with updated output]], [[gerrit:1286516{{!}}Re-enable ContentHolderTest with updated output]], [[gerrit:1286515{{!}}Revert "Remove File::getHandler language fallback" (T425988)]] * 23:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] (duration: 33m 28s) * 23:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1273.eqiad.wmnet with OS bookworm * 22:53 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:49 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1272.eqiad.wmnet with OS bookworm * 22:40 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:40 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:32 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286514{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286513{{!}}Also merge views overflow into array-items (T426115)]], [[gerrit:1286421{{!}}Special:Preferences: Display three options for thumbsizes (T424910)]] * 22:21 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:21 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1272.eqiad.wmnet with reason: host reimage * 22:18 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] (duration: 34m 01s) * 22:05 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 22:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:01 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:59 dwisehaupt@dns1004: END - running authdns-update * 21:57 dwisehaupt@dns1004: START - running authdns-update * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1271.eqiad.wmnet with OS bookworm * 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:46 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:43 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286456{{!}}Disable interactions until load is complete (T422968 T424787)]] * 21:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1273.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:41 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1273 * 21:40 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1272.eqiad.wmnet with OS bookworm * 21:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1273 * 21:38 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] (duration: 11m 56s) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1273] - vriley@cumin1003" * 21:32 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:31 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Continuing with deployment * 21:29 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:29 cscott@deploy1003: danielyepezgarces, cscott, vadymts1: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:26 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1283048{{!}}Enabling RSS extension for cowikimedia chapter (T425440)]], [[gerrit:1286390{{!}}Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary (T425332)]] * 21:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:23 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 21:19 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] (duration: 14m 51s) * 21:15 cscott@deploy1003: cscott: Continuing with deployment * 21:15 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:07 cscott@deploy1003: cscott: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Change * 21:06 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1270.eqiad.wmnet with OS bookworm * 21:05 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 21:05 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1286484{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T409751 T420336 T425981)]], [[gerrit:1286485{{!}}Bump wikimedia/parsoid to 0.24.0-a3 (T425981)]], [[gerrit:1286488{{!}}Disable unit tests that fail with new vendor release]], [[gerrit:1286489{{!}}Skip ContentHolderTest that fails with new vendor release]] * 21:03 topranks: migrate link from cr1-drmrs to asw1-b13-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 21:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:54 topranks: migrate link from cr2-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1271.eqiad.wmnet with OS bookworm * 20:50 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] (duration: 09m 03s) * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:46 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new link networks - cmooney@cumin1003" * 20:46 samtar@deploy1003: samtar, dreamrimmer: Continuing with deployment * 20:44 topranks: migrate link from cr1-drmrs to asw1-b12-drmrs to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:43 samtar@deploy1003: samtar, dreamrimmer: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:42 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1270.eqiad.wmnet with reason: host reimage * 20:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 20:41 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1271.eqiad.wmnet with reason: host reimage * 20:41 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1285482{{!}}Allow svwiki bureaucrats to remove sysop rights (T425806)]] * 20:35 topranks: migrate link from cr2-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:26 dbrant@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] (duration: 08m 27s) * 20:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1271.eqiad.wmnet with OS bookworm * 20:23 topranks: migrate link from cr1-esams to asw1-by27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:20 dbrant@deploy1003: dbrant: Continuing with deployment * 20:20 dbrant@deploy1003: dbrant: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:18 dbrant@deploy1003: Started scap sync-world: Backport for [[gerrit:1285930{{!}}docroot: Add "get_login_creds" permission to Android app. (T426010)]] * 20:16 topranks: migrate link from cr2-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:15 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] (duration: 11m 47s) * 20:11 alexsanford@deploy1003: alexsanford: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:05 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:05 topranks: migrate link from cr1-esams to asw1-bw27-esams to L2 trunk on the switch side [[phab:T424611|T424611]] * 20:03 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1285905{{!}}Enforce 2FA requirements for phase 2 groups (T423119)]], [[gerrit:1286469{{!}}Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3 (T423119 T423120)]] * 20:00 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 19:59 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:58 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:54 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 19:52 topranks: migrate link from cr2-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 topranks: migrate link from cr1-magru to asw1-b4-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:34 dancy@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] (duration: 07m 07s) * 19:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage * 19:30 dancy@deploy1003: jforrester, dancy: Continuing with deployment * 19:30 dancy@deploy1003: jforrester, dancy: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:27 dancy@deploy1003: Started scap sync-world: Backport for [[gerrit:1286464{{!}}Fix MediaHandler caching to not preserve language (T425988 T425740 T425782)]] * 19:26 topranks: migrate link from cr2-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 19:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 19:06 topranks: migrate link from cr1-magru to asw1-b3-magru to L2 trunk on the switch side [[phab:T424611|T424611]] * 19:05 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:42 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:35 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 18:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:56 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] (duration: 16m 08s) * 17:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:53 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:52 otto@deploy1003: otto: Continuing with deployment * 17:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:48 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:45 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:42 otto@deploy1003: otto: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:40 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1286434{{!}}EventStreamConfig - ingest mediawiki.user_change into the Data Lake (T423952)]] * 17:39 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:39 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:38 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 17:37 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 17:37 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 17:37 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 17:36 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 17:36 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 17:35 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub: apply * 16:46 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 16:25 moritzm: installing Exim security updates on lists/vrts hosts * 16:00 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:57 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] (duration: 07m 22s) * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:52 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:48 ladsgroup@deploy1003: ladsgroup, neriah: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup, neriah: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:45 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1286384{{!}}wikinews: Remove unnecessary settings (T421796)]] * 15:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:37 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 15:35 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 15:34 jelto: helm uninstall -n miscweb design-strategy - [[phab:T329991|T329991]] * 15:33 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:31 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 15:30 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 15:29 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:28 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 15:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 15:25 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 15:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 15:24 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:23 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:23 dancy@deploy1003: Installation of scap version "4.264.0" completed for 1 hosts * 15:22 dancy@deploy1003: Installing scap version "4.264.0" for 1 host(s) * 15:17 dancy@deploy1003: Installing scap version "4.264.0" for 163 host(s) * 15:12 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:12 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 15:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1270.eqiad.wmnet with OS bookworm * 14:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance * 14:55 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:54 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:53 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:50 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1057.eqiad.wmnet with OS bullseye * 14:47 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1056.eqiad.wmnet with OS bullseye * 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test2001.codfw.wmnet with OS bookworm * 14:45 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:44 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-toolhub-test: apply * 14:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1059.eqiad.wmnet with OS bullseye * 14:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1058.eqiad.wmnet with OS bullseye * 14:36 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs2009 to dse-k8s-wdqs-test2001 * 14:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test2001 * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test2001 on all recursors * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-wdqs-test1001.eqiad.wmnet with OS bookworm * 14:32 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs2009 to dse-k8s-wdqs-test2001 - btullis@cumin1003" * 14:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from wdqs1028 to dse-k8s-wdqs-test1001 * 14:28 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-wdqs-test1001 * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:26 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-wdqs-test1001 on all recursors * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:26 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs2009 to dse-k8s-wdqs-test2001 * 14:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming wdqs1028 to dse-k8s-wdqs-test1001 - btullis@cumin1003" * 14:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:22 btullis@cumin1003: START - Cookbook sre.dns.netbox * 14:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:21 btullis@cumin1003: START - Cookbook sre.hosts.rename from wdqs1028 to dse-k8s-wdqs-test1001 * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1059.eqiad.wmnet with reason: host reimage * 14:20 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1057.eqiad.wmnet with reason: host reimage * 14:20 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1056.eqiad.wmnet with reason: host reimage * 14:19 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1058.eqiad.wmnet with reason: host reimage * 14:17 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply * 14:17 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply * 14:15 Lucas_WMDE: UTC afternoon backport+config window done * 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] (duration: 07m 02s) * 14:11 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 14:10 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1271.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:10 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1271 * 14:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:08 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286372{{!}}Revert "page_change - add revision.revert info"]] * 14:08 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply * 14:08 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply * 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1059.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1058.eqiad.wmnet with OS bullseye * 14:07 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1057.eqiad.wmnet with OS bullseye * 14:07 root@cumin1003: START - Cookbook sre.hosts.reimage for host mc1056.eqiad.wmnet with OS bullseye * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 14:07 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 14:07 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] (duration: 39m 36s) * 14:06 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 14:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1271 * 14:05 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:05 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Rolling back deployment * 14:05 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 14:04 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1272.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1272 * 14:03 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1272 * 14:02 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 14:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1272] - vriley@cumin1003" * 13:57 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:57 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:54 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:54 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 13:51 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1270.eqiad.wmnet with OS bookworm * 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 13:49 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 13:49 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 13:49 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 13:48 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 13:48 ottomata: roll restart eventgate main to pick up mediawiki/page/change/1.4.0 schema version for [[phab:T423583|T423583]] * 13:32 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 13:29 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde, otto: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1286336{{!}}Keep all long, non-wrapping values inside parent element (T425176)]], [[gerrit:1286341{{!}}page_change - add revision.revert info]] * 13:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2233.codfw.wmnet with reason: Reboot * 13:17 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2006.codfw.wmnet with reason: Reboot * 13:14 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] (duration: 07m 13s) * 13:09 sbisson@deploy1003: sbisson: Continuing with deployment * 13:08 sbisson@deploy1003: sbisson: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:06 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1286334{{!}}ArticleGuidance: set sparql endpoint (T425389)]] * 12:40 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply * 12:38 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply * 12:26 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply * {{safesubst:SAL entry|1=12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T42}} * 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with deployment * 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425940)]] synced * {{safesubst:SAL entry|1=12:15 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1286328{{!}}Make DiscussionTools not show hCaptcha initially unless configured (T425955)]], [[gerrit:1286324{{!}}Show CAPTCHA if required for all edits before first edit attempt (T425955)]], [[gerrit:1286322{{!}}hCaptcha: Enable for DiscussionTools on testwiki (T426039)]], [[gerrit:1286318{{!}}hCaptcha: Enable for VisualEditor and MobileFrontend mediawikiwiki (T425}} * 12:10 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] (duration: 07m 45s) * 12:06 kharlan@deploy1003: kharlan: Continuing with deployment * 12:04 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:02 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286309{{!}}Special:UserLogin: Instrument no-JS form submissions (T425631)]] * 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new networks ibgp peering - cmooney@cumin1003" * 09:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] (duration: 07m 43s) * 09:51 kharlan@deploy1003: kharlan: Continuing with deployment * 09:50 kharlan@deploy1003: kharlan: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:48 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1286295{{!}}Update UserEntitySerializer callers (T426026)]] * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92480 and previous config saved to /var/cache/conftool/dbconfig/20260512-092034-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92479 and previous config saved to /var/cache/conftool/dbconfig/20260512-091025-fceratto.json * 09:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036', diff saved to https://phabricator.wikimedia.org/P92478 and previous config saved to /var/cache/conftool/dbconfig/20260512-090017-fceratto.json * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92477 and previous config saved to /var/cache/conftool/dbconfig/20260512-085009-fceratto.json * 08:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1036 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92476 and previous config saved to /var/cache/conftool/dbconfig/20260512-083526-fceratto.json * 08:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance * 08:21 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2150: after reimage to trixie * 08:17 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 08:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1231: after reimage to trixie * 08:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply * 08:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply * 08:03 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] (duration: 07m 02s) * 08:00 dcausse@deploy1003: dcausse: Rolling back deployment * 08:00 dcausse@deploy1003: dcausse: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:56 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1286253{{!}}Revert "cirrus: use a keywork tokenizer for the plain field for autocomplete"]] * 07:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2150: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS trixie * 07:29 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1231: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS trixie * 07:08 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 07:04 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage * 06:59 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2142.codfw.wmnet * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:46 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2142.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 06:43 jayme@deploy1003: Finished scap sync-world: update rsyslog image, [[phab:T418200|T418200]] (duration: 07m 56s) * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 06:42 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS trixie * 06:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1231: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Reimage to Trixie * 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2150: Reimage to Trixie * 06:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Reimage to Trixie * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2142.codfw.wmnet * 06:36 jayme@deploy1003: Started scap sync-world: update rsyslog image, [[phab:T418200|T418200]] * 06:27 jayme@dns1004: END - running authdns-update * 06:26 jayme@dns1004: START - running authdns-update * 03:39 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] (duration: 36m 36s) * 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.2 refs [[phab:T423911|T423911]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 38s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:37 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply * 00:37 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply * 00:36 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:35 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 00:14 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 00:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] (duration: 07m 24s) * 00:03 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 00:02 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285907{{!}}Skin: Correct thumbnail class (T424910)]] == 2026-05-11 == * 23:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] (duration: 06m 21s) * 23:41 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:40 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:38 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285864{{!}}Exclude sitesupport from button/icon treatment, remove manual styling (T425721)]] * 23:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] (duration: 06m 29s) * 23:20 jdlrobson@deploy1003: jdlrobson: Continuing with deployment * 23:19 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:18 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1285464{{!}}Add support for icons in toolbox (T424571)]] * 21:51 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] (duration: 06m 26s) * 21:47 cjming@deploy1003: cjming: Continuing with deployment * 21:47 cjming@deploy1003: cjming: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1285916{{!}}WikiLambdaApi instrument: update schema (T415254)]] * 21:29 maryum: Deployed security fix for [[phab:T425406|T425406]] * 21:16 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 21:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 21:15 mstyles@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] (duration: 06m 36s) * 21:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 mstyles@deploy1003: sbassett, mstyles: Continuing with deployment * 21:10 mstyles@deploy1003: sbassett, mstyles: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 mstyles@deploy1003: Started scap sync-world: Backport for [[gerrit:1284008{{!}}Enable CSPUseReportURIDirective in Wikimedia production (T424058)]] * 21:03 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1270.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:53 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:53 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1270] - vriley@cumin1003" * 20:49 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1269.eqiad.wmnet with OS bookworm * 20:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:41 jdrewniak@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] (duration: 09m 51s) * 20:37 jdrewniak@deploy1003: jdrewniak: Continuing with deployment * 20:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:33 jdrewniak@deploy1003: jdrewniak: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:32 jdrewniak@deploy1003: Started scap sync-world: Backport for [[gerrit:1285866{{!}}Bumping portals to master (T128546)]] * 20:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1269.eqiad.wmnet with reason: host reimage * 20:02 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] (duration: 06m 57s) * 20:00 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1269.eqiad.wmnet with OS bookworm * 19:58 zabe@deploy1003: zabe: Continuing with deployment * 19:57 zabe@deploy1003: zabe: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:55 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1285853{{!}}Start reading from new file tables on all small and medium wikis (T416548)]] * 19:44 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye * 19:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bvibber out of all services on: 2453 hosts * 19:39 inflatador: [bking@cumin2002] ~$ sudo cumin 'A:wdqs-main and A:codfw' 'systemctl restart wdqs-blazegraph' <- restart after banning scraper * 19:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1269.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 19:24 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1269 * 19:23 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1269 * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:22 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1269] - vriley@cumin1003" * 19:18 vriley@cumin1003: START - Cookbook sre.dns.netbox * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1268.eqiad.wmnet with OS bookworm * 19:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:16 dzahn@dns1005: END - running authdns-update * 19:14 dzahn@dns1005: START - running authdns-update * 19:12 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 19:11 inflatador: bking@archiva1002 `sudo rm -rfv /var/cache/archiva/temp* && sudo systemctl restart archiva`. to free up disk space * 18:56 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 18:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1268.eqiad.wmnet with reason: host reimage * 18:25 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 18:13 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 18:13 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 18:12 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 18:12 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1006.eqiad.wmnet with OS trixie * 18:12 ottomata: roll restarting eventgate-main to pick up changes for [[phab:T423952|T423952]] * 18:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1268.eqiad.wmnet with OS bookworm * 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1268.eqiad.wmnet with OS bookworm * 17:55 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:52 sukhe@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye * 17:47 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 17:43 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 17:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1268.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 17:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92464 and previous config saved to /var/cache/conftool/dbconfig/20260511-173804-fceratto.json * 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1268 * 17:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1268 * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1268] - vriley@cumin1003" * 17:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92463 and previous config saved to /var/cache/conftool/dbconfig/20260511-172756-fceratto.json * 17:25 vriley@cumin1003: START - Cookbook sre.dns.netbox * 17:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047', diff saved to https://phabricator.wikimedia.org/P92462 and previous config saved to /var/cache/conftool/dbconfig/20260511-171747-fceratto.json * 17:15 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye * 17:12 dancy@deploy1003: Installation of scap version "4.263.0" completed for 2 hosts * 17:11 dancy@deploy1003: Installing scap version "4.263.0" for 2 host(s) * 17:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92461 and previous config saved to /var/cache/conftool/dbconfig/20260511-170739-fceratto.json * 17:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:07 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 17:06 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 17:05 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1006.eqiad.wmnet with OS trixie * 17:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1047 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92460 and previous config saved to /var/cache/conftool/dbconfig/20260511-170024-fceratto.json * 17:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance * 16:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:51 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:50 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 16:40 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply * 16:39 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 16:39 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 16:38 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 16:37 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 16:36 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 16:27 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] (duration: 06m 54s) * 16:25 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:25 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:24 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply * 16:23 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply * 16:23 zabe@deploy1003: zabe: Continuing with deployment * 16:22 zabe@deploy1003: zabe: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:20 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281506{{!}}Disable FlaggedRevs on wikinews (T423577)]] * 16:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:02 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:00 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply * 16:00 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply * 15:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] (duration: 07m 48s) * 15:54 zabe@deploy1003: zabe: Continuing with deployment * 15:52 zabe@deploy1003: zabe: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:50 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281491{{!}}Remove custom user groups from Wikinews (T423578)]] * 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:46 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] (duration: 06m 32s) * 15:42 zabe@deploy1003: zabe: Continuing with deployment * 15:41 zabe@deploy1003: zabe: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:40 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:39 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280418{{!}}Start reading from new file tables on testwiki (2nd try) (T416548)]] * 15:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 15:21 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 15:17 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:55 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2012.codfw.wmnet with reason: DIMM replacement * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:54 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:46 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:43 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017 * 14:42 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017 * 14:42 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 14:41 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 14:41 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:39 Lucas_WMDE: UTC afternoon backport+config window done * 14:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] (duration: 18 * 14:38 vriley@cumin1003: START - Cookbook sre.dns.netbox * 14:33 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Continuing with deployment * {{safesubst:SAL entry|1=14:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, jforrester, matmarex, sfaci: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now}} * 14:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285448{{!}}Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes (T196386)]], [[gerrit:1278704{{!}}WikiLambdaApi: update stream configuration (T415254)]], [[gerrit:1285352{{!}}WikiLambdaApi instrument: Sets the custom schemaID (T415254)]], [[gerrit:1285406{{!}}editSaves: getExperiment returns a promise now (T425785)]] * {{safesubst:SAL entry|1=14:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (}} * 14:15 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm * 14:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL * 14:05 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with deployment * {{safesubst:SAL entry|1=14:04 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group}} * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-eqiad@eqiad * 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1055.eqiad.wmnet with OS bookworm * 13:56 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs * 13:50 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-eqiad@eqiad * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: dse-k8s-worker-codfw@codfw * 13:50 btullis@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:49 btullis@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs * 13:47 btullis@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: dse-k8s-worker-codfw@codfw * 13:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * {{safesubst:SAL entry|1=13:38 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1285460{{!}}Prevent username registration if the username previously existed (T196386)]], [[gerrit:1285461{{!}}Prevent username registration if the username previously existed (v2) (T196386)]], [[gerrit:1285462{{!}}API: Introduce list=globalusers (T261752)]], [[gerrit:1285761{{!}}list=globalusers: Avoid querying group permissions with empty group list (T}} * 13:36 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 13:34 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:34 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:32 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:30 btullis: restarting pybal on lvs1019 and lvs1020 for [[phab:T420437|T420437]] * 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] (duration: 06m 28s) * 13:25 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 13:24 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS bookworm * 13:22 jiji@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc1055.eqiad.wmnet with OS trixie * 13:22 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Continuing with deployment * 13:21 lucaswerkmeister-wmde@deploy1003: audreypenven, lucaswerkmeister-wmde: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:21 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1270482{{!}}Enable and configure WikiProjects prototype on Wikidata beta (T421850)]] * 13:19 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 13:19 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 13:18 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 13:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 13:16 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 13:15 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 13:14 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 13:07 otto@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] (duration: 08m 05s) * 13:06 elukey: remove old discovery pki intermediate * 13:03 otto@deploy1003: otto: Continuing with deployment * 13:01 otto@deploy1003: otto: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:59 otto@deploy1003: Started scap sync-world: Backport for [[gerrit:1285525{{!}}EventStreamConfig - add mediawiki.user_change.dev0 (T423952)]] * 12:59 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 12:58 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 12:53 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] (duration: 12m 07s) * 12:47 kharlan@deploy1003: kharlan: Continuing with deployment * 12:45 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285789{{!}}hCaptcha: Enable editing on group0 wikis (T425354)]] * 12:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:18 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1055.eqiad.wmnet with reason: host reimage * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host mc1055.eqiad.wmnet with OS trixie * 12:04 topranks: push out updated ACL to Nokia switches for BGP connections ([[phab:T425703|T425703]]) and add BFD config ([[phab:T425813|T425813]]) * 11:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2185.codfw.wmnet with reason: Reboot * 11:31 moritzm: installing Linux 6.12.86 on Trixie hosts * 11:27 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply * 11:27 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply * 11:21 jayme@deploy1003: Finished scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] (duration: 13m 28s) * 11:21 jayme@deploy1003: Rolling back deployment * 11:08 jayme@deploy1003: Started scap sync-world: upgrade rsyslog on all deployments [[phab:T418200|T418200]] * 11:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 11:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance * 10:59 jayme: uprading rsyslog to 8.2504.0-1 in all mediawiki deployments - [[phab:T418200|T418200]] * 10:52 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clรฉment Goubert out of all services on: 2459 hosts * 10:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance * 10:26 jayme@deploy1003: Finished scap sync-world: update rsyslog image (duration: 03m 48s) * 10:23 jayme@deploy1003: Started scap sync-world: update rsyslog image * 10:22 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply * 10:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/services/ratelimit: apply * 10:16 slyngs: Migrate of lvs2012 due to hardware issues * 10:14 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:13 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 10:12 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 10:11 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] (duration: 30m 15s) * 10:10 moritzm: rebalance routed Ganeti cluster in eqsin [[phab:T421863|T421863]] * 10:06 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 10:01 fceratto@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:59 kharlan@deploy1003: kharlan: Continuing with deployment * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:58 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:58 kharlan@deploy1003: kharlan: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:57 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2012.codfw.wmnet with reason: Hardware failure * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:46 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1230: [[phab:T419635|T419635]] * 09:41 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1285731{{!}}hCaptcha: Enable for group0 wikis (T425354)]] * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:37 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92456 and previous config saved to /var/cache/conftool/dbconfig/20260511-092010-fceratto.json * 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92454 and previous config saved to /var/cache/conftool/dbconfig/20260511-091001-fceratto.json * 09:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:08 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:07 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:06 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install5004.wikimedia.org to drbd * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P92453 and previous config saved to /var/cache/conftool/dbconfig/20260511-085954-fceratto.json * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:56 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1230: [[phab:T419635|T419635]] * 08:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92451 and previous config saved to /var/cache/conftool/dbconfig/20260511-084945-fceratto.json * 08:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install5004.wikimedia.org to drbd * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2218 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92450 and previous config saved to /var/cache/conftool/dbconfig/20260511-084236-fceratto.json * 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance * 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin02 and group 01 * 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet * 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet * 08:10 slyngshede@dns1004: END - running authdns-update * 08:08 slyngshede@dns1004: START - running authdns-update * 08:05 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 08:05 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 08:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old eqsin ganeti cluster VIP - ayounsi@cumin1003" * 07:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:55 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ratelimit: apply * 07:50 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 07:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 07:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 07:47 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ratelimit: apply * 07:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply * 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply * 07:08 elukey@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zarcillo.discovery.wmnet on all recursors * 07:08 elukey@cumin1003: START - Cookbook sre.dns.wipe-cache zarcillo.discovery.wmnet on all recursors * 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5004.eqsin.wmnet with OS bookworm * 06:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5004.eqsin.wmnet with reason: host reimage * 06:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2002.codfw.wmnet * 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2002.codfw.wmnet * 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast6003.wikimedia.org * 05:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast6003.wikimedia.org * 05:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5004.eqsin.wmnet with OS bookworm * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 58s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-10 == * 18:25 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:20 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 18:11 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425504]]' ESEAP_Hub_Charter 'ESEAP Hub/Governance/Charter/Previous draft' 'Martin Urbanec' # [[phab:T425504|T425504]] * 18:09 urbanecm@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki '--reason=per [[:phab:T425503]]' ESEAP_Preparatory_Council/Proposed_theory_of_change 'ESEAP Hub/Governance/ESEAP Preparatory Council/Proposed theory of change' 'Martin Urbanec' # [[phab:T425503|T425503]] * 02:06 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-09 == * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 10:34 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix dsl column size - oblivian@cumin1003 * 10:33 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix dsl column size - oblivian@cumin1003" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1267.eqiad.wmnet with OS bookworm * 01:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:44 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1267.eqiad.wmnet with reason: host reimage * 00:29 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1267.eqiad.wmnet with OS bookworm * 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED == 2026-05-08 == * 23:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1267.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 23:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1267 * 23:32 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1267 * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1267] - vriley@cumin1003" * 23:26 vriley@cumin1003: START - Cookbook sre.dns.netbox * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1266.eqiad.wmnet with OS bookworm * 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 22:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:46 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1266.eqiad.wmnet with reason: host reimage * 22:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1266.eqiad.wmnet with OS bookworm * 22:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1266.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1266 * 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1266 * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:51 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1266] - vriley@cumin1003" * 21:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1265.eqiad.wmnet with OS bookworm * 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:41 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 21:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 21:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1265.eqiad.wmnet with reason: host reimage * 20:54 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host db1265.eqiad.wmnet with OS bookworm * 20:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:32 vriley@cumin1003: START - Cookbook sre.hosts.provision for host db1265.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1265 * 20:30 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db1265 * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:29 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:29 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [db1265] - vriley@cumin1003" * 20:24 vriley@cumin1003: START - Cookbook sre.dns.netbox * 20:01 ryankemper: [WDQS] Added several more requestctl rules. They've helped marginally, but not enough to restore the service. Unless we find an obvious smoking gun, expect noise to continue for the timebeing :/ * 19:42 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 19:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 18:07 ryankemper: [WDQS] After those 2 requestctl rules, requests went down 20%, error rate decreased significantly, p50 cut almost in half, but the service is still unstable, likely we'll need to identify more throttle-candidates to restore full health * 17:53 ryankemper: [WDQS] Deployed 2 new requestctl rules; we'll see if it helps * 16:51 topranks: enable bfd on system0.0 sub-interface ssw1-d1-eqiad * 15:45 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup1003.eqiad.wmnet with reason: restart * 15:37 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup[1006,1017-1018].eqiad.wmnet with reason: restart * 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet * 14:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet * 14:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 10:51 btullis: re-pooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 10:50 btullis@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=wdqs-main,name=eqiad * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:14 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on backup1007.eqiad.wmnet with reason: restart * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:12 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:11 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:09 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:44 btullis: depooled wdqs-main in eqiad for [[phab:T425758|T425758]] * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:40 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:40 btullis@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=wdqs-main,name=eqiad * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92437 and previous config saved to /var/cache/conftool/dbconfig/20260508-093251-fceratto.json * 09:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92435 and previous config saved to /var/cache/conftool/dbconfig/20260508-092243-fceratto.json * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P92434 and previous config saved to /var/cache/conftool/dbconfig/20260508-091238-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92433 and previous config saved to /var/cache/conftool/dbconfig/20260508-090230-fceratto.json * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1189 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92432 and previous config saved to /var/cache/conftool/dbconfig/20260508-085217-fceratto.json * 08:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92431 and previous config saved to /var/cache/conftool/dbconfig/20260508-085018-fceratto.json * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92430 and previous config saved to /var/cache/conftool/dbconfig/20260508-084010-fceratto.json * 08:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P92429 and previous config saved to /var/cache/conftool/dbconfig/20260508-083003-fceratto.json * 08:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92428 and previous config saved to /var/cache/conftool/dbconfig/20260508-081954-fceratto.json * 08:18 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:17 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:04 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2207 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92427 and previous config saved to /var/cache/conftool/dbconfig/20260508-080438-fceratto.json * 08:04 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance * 07:59 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 07:56 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5003.wikimedia.org * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:09 jmm@cumin2002: START - Cookbook sre.dns.netbox * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2159: after reimage to trixie * 06:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5003.wikimedia.org * 06:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2159: after reimage to trixie * 06:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2159.codfw.wmnet with OS trixie * 06:11 moritzm: installing postorius security updates * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2159.codfw.wmnet with reason: host reimage * 05:27 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2159.codfw.wmnet with OS trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2159: Reimage to Trixie * 05:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Reimage to Trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1024.eqiad.wmnet with OS trixie * 03:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 03:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 02:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:45 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1024.eqiad.wmnet with reason: host reimage * 02:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1024.eqiad.wmnet with OS trixie * 02:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 02:07 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1024 * 02:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1024 * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 02:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:04 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1024] - vriley@cumin1003" * 02:01 vriley@cumin1003: START - Cookbook sre.dns.netbox * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1023.eqiad.wmnet with OS trixie * 01:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:30 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 01:15 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 01:11 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1023.eqiad.wmnet with reason: host reimage * 00:59 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1023.eqiad.wmnet with OS trixie * 00:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 00:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1023 * 00:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1023 * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 00:27 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:27 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1023] - vriley@cumin1003" * 00:20 vriley@cumin1003: START - Cookbook sre.dns.netbox == 2026-05-07 == * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1022.eqiad.wmnet with OS trixie * 23:25 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:24 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 23:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 23:05 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1022.eqiad.wmnet with reason: host reimage * 22:53 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 22:25 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] (duration: 01m 53s) * 22:23 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (thin): Regular analytics weekly train THIN [analytics/refinery@b38efb19] * 22:23 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] (duration: 03m 52s) * 22:19 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1]: Regular analytics weekly train [analytics/refinery@b38efb19] * 22:18 amastilovic@deploy1003: Finished deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] (duration: 01m 55s) * 22:16 amastilovic@deploy1003: Started deploy [analytics/refinery@b38efb1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@b38efb19] * {{safesubst:SAL entry|1=21:27 cscott@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)}} * 21:23 cscott@deploy1003: cscott: Continuing with deployment * 21:17 cscott@deploy1003: cscott: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]] synced to the t * {{safesubst:SAL entry|1=21:16 cscott@deploy1003: Started scap sync-world: Backport for [[gerrit:1284828{{!}}Upgrading webonyx/graphql-php (v15.31.5 => v15.32.3)]], [[gerrit:1284834{{!}}composer.json: Update webonyx/graphql-php to ^15.32.3]], [[gerrit:1284832{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T319058 T368724 T373384 T420336 T423241 T423701 T424446 T424773 T425008 T425056 T425107 T425731)]], [[gerrit:1284837{{!}}Bump wikimedia/parsoid to 0.24.0-a2 (T425731)]}} * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1021.eqiad.wmnet with OS trixie * 20:53 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:49 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] (duration: 06m 38s) * 20:48 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003" * 20:45 kemayo@deploy1003: esanders, kemayo: Continuing with deployment * 20:44 kemayo@deploy1003: esanders, kemayo: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v * 20:42 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1284575{{!}}Revert "Enable mobile editor abandonment survey on enwiki" (T424102)]], [[gerrit:1284702{{!}}Remove duplicate definition of EditCheckAction#isTagged (T425583)]], [[gerrit:1284703{{!}}Save action filtering info in ContentBranchNodeCheck#onDocumentChange (T425583)]] * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php commonswiki * 20:41 Krinkle: krinkle@deploy1003$ mwscript deleteEqualMessages.php nlwiki * 20:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:30 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1021.eqiad.wmnet with reason: host reimage * 20:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:28 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] (duration: 07m 18s) * 20:10 arlolra@deploy1003: arlolra, mmartorana: Continuing with deployment * 20:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 20:09 arlolra@deploy1003: arlolra, mmartorana: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:07 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1284687{{!}}Provide page context for LintErrorChecker (T419596)]], [[gerrit:1284771{{!}}Make email confirmation banner a standalone RL module (T425677)]] * 20:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1022.eqiad.wmnet with OS trixie * 19:59 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:57 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 19:09 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1022.eqiad.wmnet with OS trixie * 19:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 18:51 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1022 * 18:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1022 * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:49 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:49 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1022~] - vriley@cumin1003" * 18:45 vriley@cumin1003: START - Cookbook sre.dns.netbox * 18:26 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply * 18:26 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: apply * 18:25 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 18:24 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 18:22 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 18:21 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 18:20 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: apply * 18:19 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 18:18 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:06 cdanis@dns1005: END - running authdns-update * 18:04 cdanis@dns1005: START - running authdns-update * 18:02 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] (duration: 29m 24s) * 18:02 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to all wikis * 17:59 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 17:58 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 17:51 krinkle@deploy1003: krinkle: Continuing with deployment * 17:50 krinkle@deploy1003: krinkle: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:45 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 17:45 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 17:33 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1237662{{!}}Profiler: Set explicit "excimer-wall" redis channel instead of concat]] * 17:32 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 17:32 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 17:06 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: restart * 16:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet,db1245.eqiad.wmnet with reason: restart * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:48 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:47 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 16:35 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 16:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:33 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 16:32 jynus: restarting backup1-* database primary hosts * 16:30 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2183.codfw.wmnet,db1204.eqiad.wmnet with reason: restart * 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 36 hosts with reason: restart * 16:14 sukhe@dns1004: END - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:13 sukhe@dns1004: START - running authdns-update * 16:12 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 16:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 16:01 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:50 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on ms-backup[2003-2004].codfw.wmnet,ms-backup[1003-1004].eqiad.wmnet with reason: restart * 15:44 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox and A:ulsfo and (A:dnsbox) * 15:32 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 15:32 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 15:31 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:31 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P<nowiki>{</nowiki>lvs4009.ulsfo.wmnet<nowiki>}</nowiki> and A:liberica * 15:23 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts * 15:22 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts * 15:18 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:18 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 15:15 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet * 15:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo for service: upload-addrs [reason: no reason specified, no task ID specified] * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 15:06 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:05 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:03 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 15:03 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:01 akhatun: Deployed refinery using scap, then deployed onto hdfs * 14:58 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad * 14:54 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply * 14:54 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 14:53 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 14:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:52 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 14:52 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 14:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 14:44 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] (duration: 02m 01s) * 14:43 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply * 14:43 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply * 14:42 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (thin): Regular analytics weekly train THIN [analytics/refinery@4734c67c] * 14:40 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] (duration: 04m 38s) * 14:40 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad * 14:37 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply * 14:36 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply * 14:36 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67]: Regular analytics weekly train [analytics/refinery@4734c67c] * 14:35 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply * 14:35 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply * 14:33 akhatun@deploy1003: Finished deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] (duration: 01m 54s) * 14:32 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:32 slyngshede@dns1004: END - running authdns-update * 14:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:31 akhatun@deploy1003: Started deploy [analytics/refinery@4734c67] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4734c67c] * 14:31 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:31 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply * 14:30 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply * 14:30 slyngshede@dns1004: START - running authdns-update * 14:30 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply * 14:30 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: apply * 14:30 akhatun: Deploying Refinery at {{Gerrit|4734c67}} for weekly deployment train * 14:30 jmm@dns1004: END - running authdns-update * 14:29 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply * 14:28 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply * 14:28 jmm@dns1004: START - running authdns-update * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:28 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating DNS snippets - slyngshede@cumin1003" * 14:26 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply * 14:26 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply * 14:25 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: apply * 14:24 slyngshede@cumin1003: START - Cookbook sre.dns.netbox * 14:12 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw * 14:12 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply * 14:12 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply * 14:10 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply * 13:53 jasmine@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw * 13:34 stran@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] (duration: 09m 05s) * 13:30 stran@deploy1003: stran: Continuing with deployment * 13:27 stran@deploy1003: stran: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:25 stran@deploy1003: Started scap sync-world: Backport for [[gerrit:1284553{{!}}Enable staggered rollout for IRS on enwiki (T424008)]], [[gerrit:1284569{{!}}Fix when user is considered exposed to the feature in the experiment (T424075)]] * 13:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 13:10 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] (duration: 06m 55s) * 13:06 jforrester@deploy1003: rzl, jforrester, hartman: Continuing with deployment * 13:05 jforrester@deploy1003: rzl, jforrester, hartman: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1284547{{!}}Remove the progress bar]], [[gerrit:1275467{{!}}mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches (T423311)]] * 13:02 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:58 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:55 sukhe@cumin1003: START - Cookbook sre.dns.netbox * 12:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 12:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 12:50 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 12:45 sukhe@dns1004: FAIL - running authdns-update * 12:44 sukhe@dns1004: START - running authdns-update * 12:30 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1205.eqiad.wmnet with OS trixie * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5004.wikimedia.org * 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5004.wikimedia.org with OS bookworm * 12:23 slyngshede@dns1004: FAIL - running authdns-update * 12:21 slyngshede@dns1004: START - running authdns-update * 12:18 moritzm: installing init-system-helpers bugfix updates from Bookworm point release * 12:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add lswtest back as being planned won't work - cmooney@cumin1003" * 12:12 slyngshede@dns1004: FAIL - running authdns-update * 12:11 slyngshede@dns1004: START - running authdns-update * 12:11 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 12:11 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 12:11 slyngshede@cumin1003: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=ulsfo,service=authdns-update [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 12:08 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2168: after reimage to trixie * 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 12:02 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 12:02 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 12:02 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1205.eqiad.wmnet with reason: host reimage * 12:00 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5004.wikimedia.org with reason: host reimage * 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: after reimage to trixie * 11:47 root@cumin1003: START - Cookbook sre.hosts.reimage for host db1205.eqiad.wmnet with OS trixie * 11:46 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1205.eqiad.wmnet with reason: reimage * 11:43 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:43 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:40 root@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2184.codfw.wmnet with OS trixie * 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org * 11:36 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:35 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org * 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2168: after reimage to trixie * 11:19 root@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:17 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS trixie * 11:16 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:15 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:15 root@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2184.codfw.wmnet with reason: host reimage * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92412 and previous config saved to /var/cache/conftool/dbconfig/20260507-111424-fceratto.json * 11:13 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1227: after reimage to trixie * 11:11 moritzm: instaling modsecurity-apache security updates * 11:10 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS trixie * 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5004.wikimedia.org with OS bookworm * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92409 and previous config saved to /var/cache/conftool/dbconfig/20260507-110415-fceratto.json * 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 11:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:59 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:59 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:58 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host db2184 * 10:58 root@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2184 * 10:57 root@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2184 * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: START - Cookbook sre.dns.wipe-cache db2184.codfw.wmnet 129.32.192.10.in-addr.arpa 9.2.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:57 root@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] (duration: 08m 40s) * 10:55 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:54 root@cumin1003: START - Cookbook sre.dns.netbox * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92407 and previous config saved to /var/cache/conftool/dbconfig/20260507-105407-fceratto.json * 10:51 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 10:51 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage * 10:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 10:49 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:49 root@cumin1003: START - Cookbook sre.hosts.move-vlan for host db2184 * 10:48 root@cumin1003: START - Cookbook sre.hosts.reimage for host db2184.codfw.wmnet with OS trixie * 10:48 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:48 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:47 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 10:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284592{{!}}Close Russian Wikinews (T421796)]] * 10:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92406 and previous config saved to /var/cache/conftool/dbconfig/20260507-104359-fceratto.json * 10:42 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage * 10:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2184.codfw.wmnet with reason: reimage * 10:40 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:39 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 10:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92405 and previous config saved to /var/cache/conftool/dbconfig/20260507-103349-fceratto.json * 10:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:32 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS trixie * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5002.wikimedia.org * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2168: Reimage to Trixie * 10:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Reimage to Trixie * 10:30 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2182: after reimage to trixie * 10:28 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS trixie * 10:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1227: Reimage to Trixie * 10:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage to Trixie * 10:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: after reimage to trixie * 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:21 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:20 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5002.wikimedia.org * 10:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:14 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 10:13 moritzm: rebalance ganti cluster in ulsfo following host reimages [[phab:T424686|T424686]] * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy5001.wikimedia.org * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:11 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS trixie * 10:10 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:04 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 10:03 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 09:59 jmm@cumin2002: START - Cookbook sre.dns.netbox * 09:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy5001.wikimedia.org * 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage * 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2182: after reimage to trixie * 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2182.codfw.wmnet with OS trixie * 09:39 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1202: after reimage to trixie * 09:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1202.eqiad.wmnet with OS trixie * 09:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance * 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4003.wikimedia.org to drbd * 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 09:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:24 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1006.eqiad.wmnet * 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie * 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM bast4006.wikimedia.org * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 09:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2182.codfw.wmnet with reason: host reimage * 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM bast4006.wikimedia.org * 09:08 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2208: After reimage * 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1202.eqiad.wmnet with reason: host reimage * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2182.codfw.wmnet with OS trixie * 08:52 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1202.eqiad.wmnet with OS trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Reimage to Trixie * 08:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2182: Reimage to Trixie * 08:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Reimage to Trixie * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Reimage to Trixie * 08:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Reimage to Trixie * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2144.codfw.wmnet * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2144.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 08:37 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2144.codfw.wmnet * 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus4003.ulsfo.wmnet to drbd * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:28 marostegui@cumin1003: dbctl commit (dc=all): 'Remove db2144 [[phab:T425522|T425522]]', diff saved to https://phabricator.wikimedia.org/P92389 and previous config saved to /var/cache/conftool/dbconfig/20260507-082822-marostegui.json * 08:23 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:23 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db2208: After reimage * 08:23 XioNoX: drmrs remove old v6 gateway IP * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:22 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2208: After reimage * 08:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: drmrs v6 gateway IPs change - ayounsi@cumin1003" * 08:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4004.ulsfo.wmnet to drbd * 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 08:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 08:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet * 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet * 08:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 08:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01 * 07:54 dcausse@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] (duration: 09m 46s) * 07:49 dcausse@deploy1003: dcausse: Continuing with deployment * 07:46 dcausse@deploy1003: dcausse: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:44 dcausse@deploy1003: Started scap sync-world: Backport for [[gerrit:1269465{{!}}search: add alt. completion indices to test keyword tokenizer (2/2) (T420427)]] * 07:32 moritzm: installing apache2 security updates * 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow4003.ulsfo.wmnet to drbd * 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet * 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet * 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir4003.ulsfo.wmnet to drbd * 06:42 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:41 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 01 * 06:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2207: after reimage to trixie * 05:54 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2207: after reimage to trixie * 05:51 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS trixie * 05:33 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS trixie * 05:28 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage * 05:09 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:04 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage * 05:03 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS trixie * 05:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2207: Reimage to Trixie * 05:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Reimage to Trixie * 04:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2207 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92383 and previous config saved to /var/cache/conftool/dbconfig/20260507-045219-marostegui.json * 04:51 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2204 to s2 primary [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92382 and previous config saved to /var/cache/conftool/dbconfig/20260507-045141-marostegui.json * 04:51 marostegui: Starting s2 codfw failover from db2207 to db2204 - [[phab:T424848|T424848]] * 04:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 [[phab:T424848|T424848]] * 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2204 with weight 0 [[phab:T424848|T424848]]', diff saved to https://phabricator.wikimedia.org/P92381 and previous config saved to /var/cache/conftool/dbconfig/20260507-044651-marostegui.json * 04:46 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 35s) * 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:15 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] (duration: 12m 57s) * 01:09 zabe@deploy1003: zabe: Continuing with deployment * 01:09 zabe@deploy1003: zabe: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 01:02 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281485{{!}}Drop some unneeded wikinews configs (T421796)]] * 01:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 00:43 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] (duration: 33m 54s) * 00:31 zabe@deploy1003: zabe: Continuing with deployment * 00:29 zabe@deploy1003: zabe: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:10 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1277783{{!}}Undeploy GoogleNewsSitemap (T421798)]] == 2026-05-06 == * 23:41 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 23:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1021.eqiad.wmnet with OS trixie * 23:14 ladsgroup@deploy1003: Synchronized portals: Sync portals for removal of Wikinews (duration: 02m 22s) * 23:12 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Sync portals for removal of Wikinews (duration: 06m 12s) * 22:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] (duration: 07m 08s) * 22:46 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:45 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:43 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1284004{{!}}Close Spanish Wikinews (T421796)]] * 22:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] (duration: 06m 40s) * 22:28 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:28 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:26 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283872{{!}}Close English Wikinews (T421796)]] * 22:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host pc1021.eqiad.wmnet with OS trixie * 22:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:14 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] (duration: 06m 25s) * 22:11 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:11 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:10 cjming@deploy1003: cjming: Continuing with deployment * 22:10 cjming@deploy1003: cjming: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 22:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:08 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1283972{{!}}UBN fix: guard entry.serverTiming before forEach (T425591)]] * 22:06 vriley@cumin1003: START - Cookbook sre.dns.netbox * 22:05 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 22:04 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:52 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] (duration: 06m 56s) * 21:48 zabe@deploy1003: zabe: Continuing with deployment * 21:47 zabe@deploy1003: zabe: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:45 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1283953{{!}}Disable GNSM on dewikinews (T421798)]] * 21:31 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:28 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1012.eqiad.wmnet with OS trixie * 21:26 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:22 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:17 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:14 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 21:11 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [pc1021] - vriley@cumin1003" * 21:07 vriley@cumin1003: START - Cookbook sre.dns.netbox * 21:06 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1021 * 21:05 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc1021 * 21:04 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host pc1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 20:28 catrope@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] (duration: 09m 12s) * 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 20:24 catrope@deploy1003: catrope, somerandomdeveloper: Continuing with deployment * 20:21 catrope@deploy1003: catrope, somerandomdeveloper: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:19 catrope@deploy1003: Started scap sync-world: Backport for [[gerrit:1281526{{!}}Replace use of $wgRequest (T336703)]] * 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 20:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS trixie * 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS trixie * 19:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 19:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4010.ulsfo.wmnet with OS bookworm * 19:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 19:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage * 18:59 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 18:59 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 18:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:55 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:55 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:54 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 18:54 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:53 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS trixie * 18:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 18:47 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:42 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 18:42 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:42 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 18:41 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 18:40 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 18:39 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 18:37 dzahn@dns1005: END - running authdns-update * 18:35 dzahn@dns1005: START - running authdns-update * 18:33 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): blockers resolved, rolling to group1 * 18:31 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 18:29 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS bookworm * 18:02 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo * 18:01 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 17:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo * 17:55 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo * 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply * 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply * 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply * 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply * 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply * 17:28 topranks: rebooting asw1-23-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 17:27 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-23-ulsfo,asw1-23-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply * 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply * 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply * 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:16 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 17:15 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply * 17:15 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply * 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply * 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply * 17:08 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply * 17:07 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply * 17:06 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply * 17:02 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 39 hosts with reason: ulsfo depooled for switch work * 16:53 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw1-22-ulsfo,asw1-22-ulsfo IPv6 with reason: upgrading sr-linux on asw1-23-ulsfo * 16:52 topranks: rebooting asw1-22-ulsfo to upgrade SR-Linux OS on switch [[phab:T408892|T408892]] * 16:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS trixie * 16:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:37 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bookworm * 16:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bookworm * 16:28 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 16:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 16:04 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:58 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 15:57 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage * 15:38 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bookworm * 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS bookworm * 15:30 jasmine@cumin2002: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:08 sukhe: sudo cumin -b1 -s5 "C:bird and not dns4004*" "run-puppet-agent --enable 'merging CR 1282958'" * 15:08 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-eqiad cluster: Change Confluent distribution. * 15:06 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] (duration: 06m 41s) * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 15:02 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:01 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:59 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283805{{!}}Close Chinese Wikinews (T421796)]] * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5002.eqsin.wmnet * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie * 14:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5002.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:41 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:36 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 14:34 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:31 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing bird change] * 14:30 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] (duration: 11m 16s) * 14:28 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:26 kharlan@deploy1003: kharlan: Continuing with deployment * 14:25 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1282958'" * 14:23 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage * 14:22 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:21 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:21 kharlan@deploy1003: kharlan: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5002.eqsin.wmnet * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm * 14:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 14:20 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1283050{{!}}Add user_groups to editAttemptStep schema (T424010)]] * 14:19 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum5001.eqsin.wmnet * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:15 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] (duration: 06m 40s) * 14:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:13 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:12 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 14:11 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:11 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie * 14:10 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:10 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:10 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:09 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:08 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:08 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283783{{!}}Close German Wikinews (T421796)]] * 14:08 jmm@cumin2002: START - Cookbook sre.dns.netbox * 14:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] (duration: 11m 28s) * 14:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum5001.eqsin.wmnet * 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage * 13:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:55 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie * 13:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: restart to test fixes from [[phab:T425301|T425301]] - bking@cumin2002 * 13:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1194: after reimage to trixie * 13:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283751{{!}}Close French Wikinews (T421796)]] * 13:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:45 jgreen@dns1004: END - running authdns-update * 13:44 alexsanford@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] (duration: 30m 53s) * 13:44 jgreen@dns1004: START - running authdns-update * 13:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage * 13:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm * 13:35 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.ulsfo.wmnet on all recursors * 13:34 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.ulsfo.wmnet on all recursors * 13:32 alexsanford@deploy1003: alexsanford: Continuing with deployment * 13:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:31 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4010.ulsfo.wmnet with reason: host reimage * 13:31 alexsanford@deploy1003: alexsanford: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:28 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4008.ulsfo.wmnet'] * 13:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:26 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 13:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:19 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie * 13:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ganeti4008.mgmt.ulsfo.wmnet on all recursors * 13:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:18 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for ganeti4008 mgmt - cmooney@cumin1003" * 13:15 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 13:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:14 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 13:13 alexsanford@deploy1003: Started scap sync-world: Backport for [[gerrit:1283028{{!}}Add messages related to mandatory 2FA for more groups (T423119)]] * 13:12 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4010.ulsfo.wmnet with OS trixie * 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:08 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage * 13:05 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1194: after reimage to trixie * 13:05 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 13:01 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS trixie * 12:49 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS trixie * 12:45 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2012.codfw.wmnet with OS trixie * 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: update * 12:35 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage * 12:24 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:21 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2012.codfw.wmnet with reason: host reimage * 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS trixie * 12:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie * 12:16 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie * 12:16 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 12:15 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 12:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb2011.codfw.wmnet with OS trixie * 12:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] (duration: 06m 28s) * 12:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:05 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2012.codfw.wmnet with OS trixie * 12:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283735{{!}}Close Polish Wikinews (T421796)]] * 12:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 11:57 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet * 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:50 moritzm: installing openjdk-17 security updates * 11:50 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92374 and previous config saved to /var/cache/conftool/dbconfig/20260506-114919-fceratto.json * 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet * 11:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1194: Reimage to Trixie * 11:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2160.codfw.wmnet with reason: Reboot * 11:44 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1194: Reimage to Trixie * 11:44 jiji@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2011.codfw.wmnet with reason: host reimage * 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Reimage to Trixie * 11:42 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:41 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage * 11:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92372 and previous config saved to /var/cache/conftool/dbconfig/20260506-113910-fceratto.json * 11:30 jiji@cumin1003: START - Cookbook sre.hosts.reimage for host rdb2011.codfw.wmnet with OS trixie * 11:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048', diff saved to https://phabricator.wikimedia.org/P92371 and previous config saved to /var/cache/conftool/dbconfig/20260506-112903-fceratto.json * 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002" * 11:20 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie * 11:19 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie * 11:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92370 and previous config saved to /var/cache/conftool/dbconfig/20260506-111854-fceratto.json * 11:14 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie * 11:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie * 11:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot * 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage * 10:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage * 10:39 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage * 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm * 10:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti4006.ulsfo.wmnet'] * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1048 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92369 and previous config saved to /var/cache/conftool/dbconfig/20260506-101836-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1048.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92368 and previous config saved to /var/cache/conftool/dbconfig/20260506-101808-fceratto.json * 10:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie * 10:16 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie * 10:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92367 and previous config saved to /var/cache/conftool/dbconfig/20260506-100800-fceratto.json * 09:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040', diff saved to https://phabricator.wikimedia.org/P92366 and previous config saved to /var/cache/conftool/dbconfig/20260506-095752-fceratto.json * 09:55 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92365 and previous config saved to /var/cache/conftool/dbconfig/20260506-094744-fceratto.json * 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:40 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage * 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:32 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:27 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1040 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92364 and previous config saved to /var/cache/conftool/dbconfig/20260506-092414-fceratto.json * 09:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1040.eqiad.wmnet with reason: Maintenance * 09:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006/8 mgmt - ayounsi@cumin1003" * 09:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92363 and previous config saved to /var/cache/conftool/dbconfig/20260506-092345-fceratto.json * 09:17 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:17 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie * 09:16 ayounsi@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97) * 09:15 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: update * 09:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repool ms2 [[phab:T418979|T418979]]รง', diff saved to https://phabricator.wikimedia.org/P92362 and previous config saved to /var/cache/conftool/dbconfig/20260506-091513-marostegui.json * 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:14 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99) * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 09:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2253: Replacing HW [[phab:T418979|T418979]] * 09:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92361 and previous config saved to /var/cache/conftool/dbconfig/20260506-091337-fceratto.json * 09:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039', diff saved to https://phabricator.wikimedia.org/P92360 and previous config saved to /var/cache/conftool/dbconfig/20260506-090329-fceratto.json * 09:03 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] (duration: 08m 44s) * 08:59 zabe@deploy1003: zabe: Continuing with deployment * 08:56 zabe@deploy1003: zabe: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 08:54 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281894{{!}}Correctly support new file tables in RevisionDeleteUser (T424553)]] * 08:53 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92359 and previous config saved to /var/cache/conftool/dbconfig/20260506-085321-fceratto.json * 08:43 marostegui@cumin1003: dbctl commit (dc=all): 'Add db2253 to ms2 [[phab:T418973|T418973]]', diff saved to https://phabricator.wikimedia.org/P92358 and previous config saved to /var/cache/conftool/dbconfig/20260506-084337-marostegui.json * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling es1039 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92357 and previous config saved to /var/cache/conftool/dbconfig/20260506-083841-fceratto.json * 08:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance * 08:29 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:09 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 08:08 marostegui@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2208.codfw.wmnet with OS trixie * 08:06 awight: EU morning deployment is done * 08:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Replacing hw * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:59 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0) * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache * 07:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2144: Replacing HW [[phab:T418979|T418979]] * 07:47 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS trixie * 07:40 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] (duration: 08m 58s) * 07:36 awight@deploy1003: wmde-fisch, awight, dcausse: Continuing with deployment * 07:33 awight@deploy1003: wmde-fisch, awight, dcausse: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can * 07:31 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283101{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]], [[gerrit:1283037{{!}}search: fix alt. completion indices to test keyword tokenizer (T420427)]], [[gerrit:1283041{{!}}search: enable Latin-to-Devanagari transliteration second-chance (T425018)]] * 07:26 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] (duration: 07m 37s) * 07:22 awight@deploy1003: awight, lilients: Continuing with deployment * 07:21 awight@deploy1003: awight, lilients: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:19 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1283033{{!}}VE: Avoid counting all refs when listIndex is undefined (T425433)]] * 07:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4008.ulsfo.wmnet * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 07:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4008.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:55 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1191: after reimage to trixie * 06:51 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1189: after reimage to trixie * 06:48 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4008.ulsfo.wmnet * 06:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti4006.ulsfo.wmnet * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti4006.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002" * 06:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4006.ulsfo.wmnet * 05:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2208.codfw.wmnet with reason: Idrac issues [[phab:T425506|T425506]] * 05:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: host reimage * 05:26 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:25 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2208: Reimage to Trixie * 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Reimage to Trixie * 05:23 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS trixie * 05:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1191: Reimage to Trixie * 05:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Reimage to Trixie * 05:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1189.eqiad.wmnet with OS trixie * 05:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1189: Reimage to Trixie * 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1189.eqiad.wmnet with reason: Reimage to Trixie * 05:11 marostegui@dns1004: END - running authdns-update * 05:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1189 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92345 and previous config saved to /var/cache/conftool/dbconfig/20260506-050948-marostegui.json * 05:09 marostegui@dns1004: START - running authdns-update * 05:08 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92344 and previous config saved to /var/cache/conftool/dbconfig/20260506-050816-marostegui.json * 05:07 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92343 and previous config saved to /var/cache/conftool/dbconfig/20260506-050755-marostegui.json * 05:06 marostegui: Starting s3 eqiad failover from db1189 to db1223 - [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T425318|T425318]] * 05:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1223 with weight 0 [[phab:T425318|T425318]]', diff saved to https://phabricator.wikimedia.org/P92342 and previous config saved to /var/cache/conftool/dbconfig/20260506-050342-marostegui.json * 03:28 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 03:27 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 37s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:05 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] (duration: 06m 26s) * 00:49 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283061{{!}}Close Dutch Wikinews (T421796)]] * 00:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:41 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage * 00:27 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] (duration: 07m 26s) * 00:25 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1001 * 00:25 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1001 * 00:24 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS trixie * 00:23 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:21 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:20 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283117{{!}}Close Italian Wikinews (T421796)]] == 2026-05-05 == * 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip addresses for nodes in rack 23 - pt1979@cumin2002" * 23:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 22:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] (duration: 06m 58s) * 22:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283104{{!}}Close Arabic Wikinews (T421796)]] * 22:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] (duration: 06m 28s) * 22:39 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:37 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283103{{!}}Close Ukrainian Wikinews (T421796)]] * 22:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] (duration: 07m 56s) * 22:22 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:20 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:18 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283100{{!}}Close Romanian Wikinews (T421796)]] * 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] (duration: 06m 45s) * 22:12 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 22:11 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283099{{!}}Close Serbian Wikinews (T421796)]] * 22:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] (duration: 11m 07s) * 21:59 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 21:58 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:54 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283066{{!}}Close Persian Wikinews (T421796)]] * 21:49 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] (duration: 32m 55s) * 21:36 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Continuing with deployment * 21:33 arlolra@deploy1003: jdlrobson, mmartorana, arlolra: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:16 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1281501{{!}}Email confirmation banner: Remove obsolete arm_b variant (T421366)]], [[gerrit:1283056{{!}}Legacy parser no longer varies by user thumbnail size. (T417513)]] * 20:59 dancy@deploy1003: Installation of scap version "4.262.1" completed for 2 hosts * 20:57 dancy@deploy1003: Installing scap version "4.262.1" for 2 host(s) * 20:57 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] (duration: 10m 59s) * 20:52 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Continuing with deployment * 20:48 arlolra@deploy1003: mpostoronca, h2o, awight, arlolra: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be ve * 20:46 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1282930{{!}}hCaptcha: Add diagnostic context to script load error logs (T424496)]], [[gerrit:1282397{{!}}sectionCollapsing: Scroll to fragment target on init (T425290)]], [[gerrit:1282804{{!}}Errors added below ref list dirty when not responsive (T384599)]] * 20:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie * 20:22 arlolra@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] (duration: 10m 30s) * 20:20 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS trixie * 20:18 arlolra@deploy1003: aaron, neriah, arlolra: Continuing with deployment * 20:14 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:13 arlolra@deploy1003: aaron, neriah, arlolra: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:12 arlolra@deploy1003: Started scap sync-world: Backport for [[gerrit:1283082{{!}}Enable WikiLove on shwiki (T424891)]], [[gerrit:1276814{{!}}Add wikibase.v1 module to the sandbox were it is present (T422403)]] * 20:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 20:09 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 20:07 pt1979@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage * 20:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:57 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage * 19:55 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 19:55 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:54 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:45 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1002 * 19:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1002 * 19:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:40 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:39 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1002 * 19:39 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1002.eqiad.wmnet 142.32.64.10.in-addr.arpa 2.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:39 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:38 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1002 - herron@cumin1003" * 19:32 herron@cumin1003: START - Cookbook sre.dns.netbox * 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 19:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1002 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS trixie * 19:17 dancy@deploy1003: Installation of scap version "4.262.0" completed for 2 hosts * 19:15 dancy@deploy1003: Installing scap version "4.262.0" for 2 host(s) * 19:15 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: rebooting firewall in desperation * 19:14 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 19:05 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:05 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set correct vlan group in netbox for new ulsfo vlans - cmooney@cumin1003 - [[phab:T408892|T408892]]" * 19:04 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 19:03 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] (duration: 10m 59s) * 18:56 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:52 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283063{{!}}Close Swedish Wikinews (T421796)]] * 18:49 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:47 brennen@deploy1003: Finished scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] (duration: 36m 04s) * 18:44 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cp4038 ip address - pt1979@cumin2002" * 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 18:30 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:25 pt1979@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS trixie * 18:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-codfw * 18:13 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-codfw * 18:13 pt1979@cumin1003: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie * 18:11 brennen@deploy1003: Started scap sync-world: testwikis to 1.47.0-wmf.1 refs [[phab:T423910|T423910]] * 18:10 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device pfw1a-eqiad * 18:10 cmooney@cumin1003: START - Cookbook sre.network.tls for network device pfw1a-eqiad * 18:06 brennen: 1.47.0-wmf.1 train status ([[phab:T423910|T423910]]): no current blockers, rolling to group0 * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:38 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage * 17:33 herron@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 17:32 herron@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 17:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging1003 * 17:21 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging1003 * 17:20 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:19 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:16 herron@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging1003.eqiad.wmnet 66.48.64.10.in-addr.arpa 6.6.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:15 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging1003 - herron@cumin1003" * 17:12 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:09 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1003 * 17:08 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS trixie * 17:05 sukhe: sudo cumin -b11 "A:cp and not P<nowiki>{</nowiki>cp2041* or cp2042*<nowiki>}</nowiki> and not A:ulsfo" "run-puppet-agent --enable 'merging CR 1282979'" * 16:58 sbassett@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] (duration: 07m 25s) * 16:53 sbassett@deploy1003: mstyles, sbassett: Continuing with deployment * 16:52 sbassett@deploy1003: mstyles, sbassett: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdeb * 16:50 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]], [[gerrit:1283049{{!}}Remove undefined variable $wmgUseCSPReportOnlyHasSession (T419612 T420604 T420607)]] * 16:38 sbassett@deploy1003: Started scap sync-world: Backport for [[gerrit:1283036{{!}}Set $wgReauthenticateTime editsitejs to one hour (T197137)]], [[gerrit:1283020{{!}}Set CSP to enforce with allow-listed domains in Wikimedia production (T419612 T420604 T420607)]] * 16:19 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 16:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 16:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 16:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] (duration: 06m 16s) * 16:07 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:07 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:05 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283040{{!}}Close Japanese Wikinews (T421796)]] * 16:01 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] (duration: 07m 53s) * 15:57 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:55 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync * 15:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:55 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync * 15:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync * 15:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync * 15:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283039{{!}}Close Korean Wikinews (T421796)]] * 15:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] (duration: 06m 12s) * 15:48 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:47 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283038{{!}}Close Finnish Wikinews (T421796)]] * 15:42 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 15:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 15:39 dzahn@dns1005: END - running authdns-update * 15:38 mutante: deleting mwmaint.discovery.wmnet DNS entry - the hosts behind it dont exist anymore * 15:37 dzahn@dns1005: START - running authdns-update * 15:24 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:24 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 15:21 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply * 15:20 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply * 15:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] (duration: 06m 17s) * 15:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92340 and previous config saved to /var/cache/conftool/dbconfig/20260505-151930-fceratto.json * 15:16 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:16 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:14 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283024{{!}}Close Czech Wikinews (T421796)]] * 15:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92339 and previous config saved to /var/cache/conftool/dbconfig/20260505-150921-fceratto.json * 15:08 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] (duration: 07m 06s) * 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' . * 15:04 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1283003{{!}}Close Tamil Wikinews (T421796)]] * 14:59 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] (duration: 07m 48s) * 14:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P92338 and previous config saved to /var/cache/conftool/dbconfig/20260505-145913-fceratto.json * 14:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:57 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:55 urbanecm@deploy1003: urbanecm: Continuing with deployment * 14:53 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92337 and previous config saved to /var/cache/conftool/dbconfig/20260505-145231-fceratto.json * 14:51 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1283002{{!}}fix: wrong property name action_data (T425425)]] * 14:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92336 and previous config saved to /var/cache/conftool/dbconfig/20260505-144905-fceratto.json * 14:44 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92335 and previous config saved to /var/cache/conftool/dbconfig/20260505-144223-fceratto.json * 14:42 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 14:41 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 14:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2247 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92334 and previous config saved to /var/cache/conftool/dbconfig/20260505-144029-fceratto.json * 14:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance * 14:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92333 and previous config saved to /var/cache/conftool/dbconfig/20260505-143958-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P92332 and previous config saved to /var/cache/conftool/dbconfig/20260505-143214-fceratto.json * 14:30 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad * 14:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92331 and previous config saved to /var/cache/conftool/dbconfig/20260505-142949-fceratto.json * 14:28 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master1001.eqiad.wmnet * 14:25 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage * 14:24 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master1001.eqiad.wmnet * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92329 and previous config saved to /var/cache/conftool/dbconfig/20260505-142206-fceratto.json * 14:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P92328 and previous config saved to /var/cache/conftool/dbconfig/20260505-141941-fceratto.json * 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet * 14:11 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1004 * 14:10 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS trixie * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1015.eqiad.wmnet * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:09 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92327 and previous config saved to /var/cache/conftool/dbconfig/20260505-140933-fceratto.json * 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet * 14:07 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:06 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:05 eevans@cumin1003: START - Cookbook sre.dns.netbox * 14:05 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad * 14:05 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw * 14:04 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync * 14:04 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync * 14:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 14:03 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 14:03 Lucas_WMDE: UTC afternoon backport+config window done * 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM config-master2001.codfw.wmnet * 14:02 jasmine@cumin2002: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 14:01 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1015.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1014.eqiad.wmnet * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:01 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:01 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2246 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92326 and previous config saved to /var/cache/conftool/dbconfig/20260505-140047-fceratto.json * 14:00 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92325 and previous config saved to /var/cache/conftool/dbconfig/20260505-140016-fceratto.json * 13:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1227: Repooling * 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM config-master2001.codfw.wmnet * 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:55 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] (duration: 06m 22s) * 13:50 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1014.eqiad.wmnet * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92323 and previous config saved to /var/cache/conftool/dbconfig/20260505-135008-fceratto.json * 13:50 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 13:49 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:49 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1011.eqiad.wmnet * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:47 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:47 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282988{{!}}Close Portuguese Wikinews (T421796)]] * 13:47 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:45 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2209 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92321 and previous config saved to /var/cache/conftool/dbconfig/20260505-134522-fceratto.json * 13:45 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance * 13:44 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1227: Repooling * 13:44 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:43 jasmine@cumin2002: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-main-codfw cluster: Change Confluent distribution. * 13:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92319 and previous config saved to /var/cache/conftool/dbconfig/20260505-134257-fceratto.json * 13:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P92318 and previous config saved to /var/cache/conftool/dbconfig/20260505-134000-fceratto.json * 13:37 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1011.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1010.eqiad.wmnet * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:37 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:37 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1003" * 13:33 eevans@cumin1003: START - Cookbook sre.dns.netbox * 13:30 Msz2001: UTC afternoon backport window done * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92317 and previous config saved to /var/cache/conftool/dbconfig/20260505-132952-fceratto.json * 13:27 eevans@cumin1003: START - Cookbook sre.hosts.decommission for hosts aqs1010.eqiad.wmnet * 13:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 13:23 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 13:23 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] (duration: 08m 37s) * 13:23 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 13:22 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 13:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on dborch1002.wikimedia.org with reason: [[phab:T416582|T416582]] * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2245 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92316 and previous config saved to /var/cache/conftool/dbconfig/20260505-132002-fceratto.json * 13:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance * 13:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92315 and previous config saved to /var/cache/conftool/dbconfig/20260505-131931-fceratto.json * 13:19 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Continuing with deployment * 13:16 mszwarc@deploy1003: mszwarc, jhsoby, matmarex, d3r1ck01: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug * 13:15 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1270882{{!}}Remove temporary `wgOAuth2UsePrefixedSub` feature flag (T417690)]], [[gerrit:1271969{{!}}Move privileged global and local group handling to WikimediaCustomizations (T418507)]], [[gerrit:1281964{{!}}Add Akan (ak) to wmgExtraLanguageNames by default (T333765 T425256)]] * 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet * 13:11 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] (duration: 07m 55s) * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:11 atsuko@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92314 and previous config saved to /var/cache/conftool/dbconfig/20260505-130923-fceratto.json * 13:07 mszwarc@deploy1003: mszwarc: Continuing with deployment * 13:05 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:03 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1282850{{!}}Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis (T418484)]] * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P92313 and previous config saved to /var/cache/conftool/dbconfig/20260505-125915-fceratto.json * 12:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] (duration: 07m 23s) * 12:52 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 12:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:49 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282970{{!}}Close Esperanto Wikinews (T421796)]] * 12:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92312 and previous config saved to /var/cache/conftool/dbconfig/20260505-124907-fceratto.json * 12:44 sgimeno@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] (duration: 03m 56s) * 12:43 sgimeno@deploy1003: sgimeno: Continuing with deployment * 12:42 moritzm: installing node-tar security updates * 12:41 sgimeno@deploy1003: sgimeno: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:40 sgimeno@deploy1003: Started scap sync-world: Backport for [[gerrit:1280226{{!}}loggedOutWarning: instrument browser navigation and tab close (T421518)]] * 12:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2240 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92311 and previous config saved to /var/cache/conftool/dbconfig/20260505-124041-fceratto.json * 12:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance * 12:36 moritzm: installing imagemagick security updates * 12:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance * 12:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92310 and previous config saved to /var/cache/conftool/dbconfig/20260505-123411-fceratto.json * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:33 atsuko@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ttmserver-test: apply * 12:31 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 12:29 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 12:28 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 12:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 12:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92309 and previous config saved to /var/cache/conftool/dbconfig/20260505-122404-fceratto.json * 12:23 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 12:23 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 12:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P92308 and previous config saved to /var/cache/conftool/dbconfig/20260505-121352-fceratto.json * 12:04 moritzm: installing postgresql-13 security updates * 12:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92307 and previous config saved to /var/cache/conftool/dbconfig/20260505-120344-fceratto.json * 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] (duration: 06m 13s) * 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2005.codfw.wmnet * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2237 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92306 and previous config saved to /var/cache/conftool/dbconfig/20260505-115535-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92305 and previous config saved to /var/cache/conftool/dbconfig/20260505-115503-fceratto.json * 11:53 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:53 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2005.codfw.wmnet * 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282960{{!}}Close Shan Wikinews (T421796)]] * 11:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] (duration: 09m 21s) * 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2004.codfw.wmnet * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92304 and previous config saved to /var/cache/conftool/dbconfig/20260505-114455-fceratto.json * 11:43 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2004.codfw.wmnet * 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd2003.codfw.wmnet * 11:39 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd2003.codfw.wmnet * 11:38 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282955{{!}}Close Norwegian Wikinews (T421796)]] * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P92303 and previous config saved to /var/cache/conftool/dbconfig/20260505-113446-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92302 and previous config saved to /var/cache/conftool/dbconfig/20260505-112449-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92301 and previous config saved to /var/cache/conftool/dbconfig/20260505-112438-fceratto.json * 11:16 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2236 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92300 and previous config saved to /var/cache/conftool/dbconfig/20260505-111616-fceratto.json * 11:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92299 and previous config saved to /var/cache/conftool/dbconfig/20260505-111545-fceratto.json * 11:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92298 and previous config saved to /var/cache/conftool/dbconfig/20260505-111435-fceratto.json * 11:10 moritzm: installing ca-certificates updates from bookworm point release * 11:09 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2221: after reimage to trixie * 11:07 moritzm: installing multipart bugfix updates from bookworm point release * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92296 and previous config saved to /var/cache/conftool/dbconfig/20260505-110537-fceratto.json * 11:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:05 ayounsi@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P<nowiki>{</nowiki>lvs4009*<nowiki>}</nowiki> and A:liberica * 11:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P92295 and previous config saved to /var/cache/conftool/dbconfig/20260505-110427-fceratto.json * 11:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1174: after reimage to trixie * 10:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P92293 and previous config saved to /var/cache/conftool/dbconfig/20260505-105529-fceratto.json * 10:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92291 and previous config saved to /var/cache/conftool/dbconfig/20260505-105419-fceratto.json * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'. * 10:50 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'. * 10:49 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'. * 10:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92290 and previous config saved to /var/cache/conftool/dbconfig/20260505-104521-fceratto.json * 10:40 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92288 and previous config saved to /var/cache/conftool/dbconfig/20260505-104032-fceratto.json * 10:40 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2219 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92286 and previous config saved to /var/cache/conftool/dbconfig/20260505-103702-fceratto.json * 10:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance * 10:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92285 and previous config saved to /var/cache/conftool/dbconfig/20260505-103632-fceratto.json * 10:32 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:29 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 10:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92283 and previous config saved to /var/cache/conftool/dbconfig/20260505-102623-fceratto.json * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 10:24 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2221: after reimage to trixie * 10:24 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 10:23 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:23 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 10:22 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 10:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS trixie * 10:17 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 10:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P92281 and previous config saved to /var/cache/conftool/dbconfig/20260505-101616-fceratto.json * 10:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1174: after reimage to trixie * 09:42 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 09:41 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply * 09:39 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply * 09:38 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply * 09:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P92271 and previous config saved to /var/cache/conftool/dbconfig/20260505-093703-fceratto.json * 09:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92270 and previous config saved to /var/cache/conftool/dbconfig/20260505-093619-fceratto.json * 09:36 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply * 09:35 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1173 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92269 and previous config saved to /var/cache/conftool/dbconfig/20260505-093305-fceratto.json * 09:32 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance * 09:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS trixie * 09:30 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS trixie * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 09:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1174: Reimage to Trixie * 09:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2221: Reimage to Trixie * 09:29 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1174: Reimage to Trixie * 09:28 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2221: Reimage to Trixie * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Reimage to Trixie * 09:28 aikochou@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 09:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Reimage to Trixie * 09:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92265 and previous config saved to /var/cache/conftool/dbconfig/20260505-092654-fceratto.json * 09:26 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply * 09:25 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply * 09:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92264 and previous config saved to /var/cache/conftool/dbconfig/20260505-092431-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2206 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92263 and previous config saved to /var/cache/conftool/dbconfig/20260505-091808-fceratto.json * 09:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance * 09:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92262 and previous config saved to /var/cache/conftool/dbconfig/20260505-091423-fceratto.json * 09:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance * 09:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92260 and previous config saved to /var/cache/conftool/dbconfig/20260505-091254-fceratto.json * 09:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P92259 and previous config saved to /var/cache/conftool/dbconfig/20260505-090415-fceratto.json * 09:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92258 and previous config saved to /var/cache/conftool/dbconfig/20260505-090246-fceratto.json * 08:58 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2209: after reimage to trixie * 08:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92256 and previous config saved to /var/cache/conftool/dbconfig/20260505-085407-fceratto.json * 08:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS trixie * 08:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P92255 and previous config saved to /var/cache/conftool/dbconfig/20260505-085238-fceratto.json * 08:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 08:50 moritzm: installing augeas security updates * 08:49 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:46 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2213 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92254 and previous config saved to /var/cache/conftool/dbconfig/20260505-084616-fceratto.json * 08:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 08:42 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' . * 08:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92253 and previous config saved to /var/cache/conftool/dbconfig/20260505-084231-fceratto.json * 08:41 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' . * 08:40 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' . * 08:38 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' . * 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 08:37 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' . * 08:35 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' . * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply * 08:34 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply * 08:34 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 13 hosts with reason: switches replacement * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92252 and previous config saved to /var/cache/conftool/dbconfig/20260505-083356-fceratto.json * 08:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance * 08:33 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92251 and previous config saved to /var/cache/conftool/dbconfig/20260505-083326-fceratto.json * 08:32 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply * 08:32 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply * 08:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:29 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 08:24 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' . * 08:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92250 and previous config saved to /var/cache/conftool/dbconfig/20260505-082318-fceratto.json * 08:22 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2222: after reimage to trixie * 08:22 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage * 08:16 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/x-flac # [[phab:T414641|T414641]] * 08:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: after reimage to trixie * 08:14 jmm@cumin2002: START - Cookbook sre.dns.netbox * 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 08:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P92247 and previous config saved to /var/cache/conftool/dbconfig/20260505-081309-fceratto.json * 08:08 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --broken-only --mediatype AUDIO --mime audio/flac # [[phab:T414641|T414641]] * 08:05 ayounsi@dns1004: END - running authdns-update * 08:03 ayounsi@dns1004: START - running authdns-update * 08:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92245 and previous config saved to /var/cache/conftool/dbconfig/20260505-080301-fceratto.json * 08:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS trixie * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ulsfo includes - ayounsi@cumin1003" * 08:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2209: Reimage to Trixie * 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2209.codfw.wmnet with reason: Reimage to Trixie * 07:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2209 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92243 and previous config saved to /var/cache/conftool/dbconfig/20260505-075746-marostegui.json * 07:56 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2205 to s3 primary [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92242 and previous config saved to /var/cache/conftool/dbconfig/20260505-075654-marostegui.json * 07:55 awight: EU morning deployment was fun * 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2155 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92241 and previous config saved to /var/cache/conftool/dbconfig/20260505-075416-fceratto.json * 07:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance * 07:52 marostegui: Starting s3 codfw failover from db2209 to db2205 - [[phab:T424864|T424864]] * 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2205 with weight 0 [[phab:T424864|T424864]]', diff saved to https://phabricator.wikimedia.org/P92239 and previous config saved to /var/cache/conftool/dbconfig/20260505-075156-marostegui.json * 07:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 [[phab:T424864|T424864]] * 07:50 zabe: zabe@deploy1003:~$ foreachwiki refreshImageMetadata --force --mediatype AUDIO --mime audio/midi # [[phab:T414645|T414645]] * 07:45 zabe: zabe@deploy1003:~$ mwscript namespaceDupes.php scnwiki --fix # [[phab:T425378|T425378]] * 07:36 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2222: after reimage to trixie * 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS trixie * 07:30 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1170: after reimage to trixie * 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS trixie * 07:11 awight@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] (duration: 06m 43s) * 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:07 awight@deploy1003: awight, 1f616emo: Continuing with deployment * 07:06 awight@deploy1003: awight, 1f616emo: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 awight@deploy1003: Started scap sync-world: Backport for [[gerrit:1281967{{!}}zhwikinews: (2/2) revert 20th anniversary logo change (assets) (T420165)]] * 07:03 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 07:03 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 07:00 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage * 07:00 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1156: after reimage to trixie * 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox * 06:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 06:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS trixie * 06:44 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Reimage to Trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Reimage to Trixie * 06:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2222: Reimage to Trixie * 06:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Reimage to Trixie * 06:14 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1156: after reimage to trixie * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS trixie * 05:49 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:46 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: patterns_as_inline_patterns - oblivian@cumin1003 * 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "patterns_as_inline_patterns - oblivian@cumin1003" * 05:33 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS trixie * 05:31 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1156: Reimage to Trixie * 05:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1156.eqiad.wmnet with reason: Reimage to Trixie * 05:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s2 master: reimage to Debian Trixie * 04:03 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.23 (duration: 03m 12s) * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 39s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns infor for new switches - pt1979@cumin2002" * 01:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox * 00:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] (duration: 06m 50s) * 00:11 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 00:10 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282434{{!}}Close Catalan Wikinews (T421796)]] == 2026-05-04 == * 23:48 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:46 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282432{{!}}Close Bosnian Wikinews (T421796)]] * 23:14 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] (duration: 06m 45s) * 23:10 ladsgroup@deploy1003: neriah, ladsgroup: Continuing with deployment * 23:09 ladsgroup@deploy1003: neriah, ladsgroup: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:07 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282410{{!}}Close Hebrew Wikinews (T421796)]] * 22:08 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 22:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 21:32 cwhite@deploy1003: Finished deploy [statsv/statsv@152de49]: fix logging (duration: 00m 11s) * 21:32 cwhite@deploy1003: Started deploy [statsv/statsv@152de49]: fix logging * 21:20 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] (duration: 11m 20s) * 21:16 cjming@deploy1003: cjming, neriah: Continuing with deployment * 21:10 cjming@deploy1003: cjming, neriah: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:09 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1276432{{!}}Enable Hebrew keyboard DWIM for namespace resolution on hewikis (T412468)]] * 20:38 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] (duration: 22m 19s) * 20:34 cjming@deploy1003: mmartorana, cjming: Continuing with deployment * 20:18 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:16 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1282385{{!}}Revert^2 "Use js promise for email confirmation banner"]] * 20:11 toyofuku@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] (duration: 07m 21s) * 20:07 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1005.eqiad.wmnet with OS trixie * 20:06 toyofuku@deploy1003: toyofuku: Continuing with deployment * 20:05 toyofuku@deploy1003: toyofuku: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for [[gerrit:1277667{{!}}Enable the reading list beta feature survey on all wikipedias (T421776)]] * 19:51 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) asw1-22-ulsfo.wikimedia.org on all recursors * 19:50 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache asw1-22-ulsfo.wikimedia.org on all recursors * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:49 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: asw1-22-ulsfo - ayounsi@cumin1003" * 19:48 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 19:42 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1005.eqiad.wmnet with reason: host reimage * 19:40 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:37 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:28 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: ongoing troubleshooting * 19:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging1005 * 19:27 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging1005.eqiad.wmnet with OS trixie * 19:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:23 root@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'. * 19:23 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: remove privatemounts to see if it helps - bking@cumin2002 - [[phab:T424852|T424852]] * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 19:06 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 18:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] (duration: 06m 16s) * 18:55 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282407{{!}}Close Limburgish Wikinews (T421796)]] * 18:31 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] (duration: 09m 17s) * 18:27 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 18:23 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:22 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282405{{!}}Close Albanian Wikinews (T421796)]] * 18:11 dancy@deploy1003: Finished scap sync-world: testing (duration: 02m 04s) * 18:11 dancy@deploy1003: dancy: Rolling back deployment * 18:10 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 18:09 dancy@deploy1003: Started scap sync-world: testing * 18:08 dancy@deploy1003: Installation of scap version "4.260.0" completed for 2 hosts * 18:06 dancy@deploy1003: Installing scap version "4.260.0" for 2 host(s) * 17:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 17:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 16:40 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:39 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:34 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:33 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:33 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 16:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] (duration: 06m 19s) * 16:00 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 16:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282384{{!}}Close Greek Wikinews (T421796)]] * 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92224 and previous config saved to /var/cache/conftool/dbconfig/20260504-155514-fceratto.json * 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92223 and previous config saved to /var/cache/conftool/dbconfig/20260504-154506-fceratto.json * 15:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] (duration: 06m 59s) * 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92222 and previous config saved to /var/cache/conftool/dbconfig/20260504-153458-fceratto.json * 15:34 ladsgroup@deploy1003: ladsgroup, chlod: Continuing with deployment * 15:33 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on 39 hosts with reason: switches replacement * 15:33 ladsgroup@deploy1003: ladsgroup, chlod: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync * 15:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync * 15:31 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282060{{!}}Make errorpages responsive]] * 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92221 and previous config saved to /var/cache/conftool/dbconfig/20260504-152449-fceratto.json * 15:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92220 and previous config saved to /var/cache/conftool/dbconfig/20260504-152238-fceratto.json * 15:22 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 15:20 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:17 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 15:17 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 15:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install5004.wikimedia.org * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92219 and previous config saved to /var/cache/conftool/dbconfig/20260504-151238-fceratto.json * 15:10 papaul: ongoing switch refresh in ULSFO * 15:10 jmm@cumin2002: START - Cookbook sre.dns.netbox * 15:10 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99) * 15:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 15:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] (duration: 06m 45s) * 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92218 and previous config saved to /var/cache/conftool/dbconfig/20260504-150230-fceratto.json * 15:01 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 15:00 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:58 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1282381{{!}}Close Gun Wikinews (T421796)]] * 14:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS trixie * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P92217 and previous config saved to /var/cache/conftool/dbconfig/20260504-145222-fceratto.json * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92216 and previous config saved to /var/cache/conftool/dbconfig/20260504-144213-fceratto.json * 14:41 pt1979@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts * 14:41 pt1979@cumin1003: START - Cookbook sre.hosts.remove-downtime for 7 hosts * 14:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:34 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage * 14:33 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P92215 and previous config saved to /var/cache/conftool/dbconfig/20260504-143334-fceratto.json * 14:33 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance * 14:30 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[3-4]-ulsfo IPv6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPv6 with reason: switch refresh * 14:28 pt1979@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on cr[3-4]-ulsfo IPV6,cr[3-4]-ulsfo.mgmt,mr1-ulsfo IPV6 with reason: switch refresh * 14:25 pt1979@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on asw2-ulsfo,cr[3-4]-ulsfo,mr1-ulsfo with reason: switch refresh * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2001 * 14:16 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2001 * 14:13 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2001 * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2001.codfw.wmnet 94.0.192.10.in-addr.arpa 4.9.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:13 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2001 - herron@cumin1003" * 14:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92214 and previous config saved to /var/cache/conftool/dbconfig/20260504-141113-fceratto.json * 14:07 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:04 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2001 * 14:04 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS trixie * 14:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92213 and previous config saved to /var/cache/conftool/dbconfig/20260504-140105-fceratto.json * 14:00 slyngshede@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=ulsfo [reason: ulsfo switch refresh [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 14:00 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: New switch configuration, [[phab:T408892|T408892]]] * 13:59 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] (duration: 06m 22s) * 13:57 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5004.wikimedia.org on all recursors * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5004.wikimedia.org - jmm@cumin2002" * 13:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 13:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 13:55 sbisson@deploy1003: sbisson: Continuing with deployment * 13:55 sbisson@deploy1003: sbisson: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:54 dcausse: [[phab:T425301|T425301]]: stopping writes again on cloudelastic, cluster unstable * 13:53 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1282354{{!}}ArticleGuidance: enable on simple english (T425351)]] * 13:52 jmm@cumin2002: START - Cookbook sre.dns.netbox * 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5004.wikimedia.org * 13:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P92212 and previous config saved to /var/cache/conftool/dbconfig/20260504-135056-fceratto.json * 13:50 sbisson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] (duration: 07m 30s) * 13:46 sbisson@deploy1003: 1f616emo, sbisson: Continuing with deployment * 13:45 sbisson@deploy1003: 1f616emo, sbisson: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 13:43 sbisson@deploy1003: Started scap sync-world: Backport for [[gerrit:1281965{{!}}zhwikinews: (1/2) revert 20th anniversary logo change (config) (T420165)]] * 13:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92211 and previous config saved to /var/cache/conftool/dbconfig/20260504-134048-fceratto.json * 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1226 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92210 and previous config saved to /var/cache/conftool/dbconfig/20260504-133039-fceratto.json * 13:30 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance * 13:30 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92209 and previous config saved to /var/cache/conftool/dbconfig/20260504-133010-fceratto.json * 13:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92208 and previous config saved to /var/cache/conftool/dbconfig/20260504-132002-fceratto.json * 13:13 moritzm: installing jaraco.context security updates * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5004.eqsin.wmnet * 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5004.eqsin.wmnet with OS bookworm * 13:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P92207 and previous config saved to /var/cache/conftool/dbconfig/20260504-130953-fceratto.json * 12:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92206 and previous config saved to /var/cache/conftool/dbconfig/20260504-125945-fceratto.json * 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 12:59 dcausse: [[phab:T425301|T425301]]: resuming writes on cloudelastic * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1214 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92205 and previous config saved to /var/cache/conftool/dbconfig/20260504-125247-fceratto.json * 12:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance * 12:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92204 and previous config saved to /var/cache/conftool/dbconfig/20260504-125219-fceratto.json * 12:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance * 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5004.eqsin.wmnet with reason: host reimage * 12:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92203 and previous config saved to /var/cache/conftool/dbconfig/20260504-124210-fceratto.json * 12:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P92202 and previous config saved to /var/cache/conftool/dbconfig/20260504-123203-fceratto.json * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92201 and previous config saved to /var/cache/conftool/dbconfig/20260504-122155-fceratto.json * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1203 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92200 and previous config saved to /var/cache/conftool/dbconfig/20260504-121441-fceratto.json * 12:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance * 12:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92199 and previous config saved to /var/cache/conftool/dbconfig/20260504-121424-fceratto.json * 12:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92198 and previous config saved to /var/cache/conftool/dbconfig/20260504-120416-fceratto.json * 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5004.eqsin.wmnet with OS bookworm * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5004.eqsin.wmnet on all recursors * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P92197 and previous config saved to /var/cache/conftool/dbconfig/20260504-115408-fceratto.json * 11:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5004.eqsin.wmnet - jmm@cumin2002" * 11:47 jmm@cumin2002: START - Cookbook sre.dns.netbox * 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5004.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5003.eqsin.wmnet * 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5003.eqsin.wmnet with OS bookworm * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92196 and previous config saved to /var/cache/conftool/dbconfig/20260504-114400-fceratto.json * 11:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1193 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92195 and previous config saved to /var/cache/conftool/dbconfig/20260504-113620-fceratto.json * 11:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92194 and previous config saved to /var/cache/conftool/dbconfig/20260504-113550-fceratto.json * 11:27 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: after reimage to trixie * 11:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5003.eqsin.wmnet with reason: host reimage * 11:25 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92192 and previous config saved to /var/cache/conftool/dbconfig/20260504-112542-fceratto.json * 11:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P92191 and previous config saved to /var/cache/conftool/dbconfig/20260504-111534-fceratto.json * 11:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92189 and previous config saved to /var/cache/conftool/dbconfig/20260504-110526-fceratto.json * 11:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2187: repool after maintenance * 10:58 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1192 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92187 and previous config saved to /var/cache/conftool/dbconfig/20260504-105808-fceratto.json * 10:58 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance * 10:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92186 and previous config saved to /var/cache/conftool/dbconfig/20260504-105739-fceratto.json * 10:48 moritzm: installing bash updates from trixie point release * 10:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92184 and previous config saved to /var/cache/conftool/dbconfig/20260504-104731-fceratto.json * 10:42 moritzm: installing postgresql-17 security updates * 10:42 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: after reimage to trixie * 10:39 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS trixie * 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum5003.eqsin.wmnet with OS bookworm * 10:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P92181 and previous config saved to /var/cache/conftool/dbconfig/20260504-103723-fceratto.json * 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum5003.eqsin.wmnet on all recursors * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum5003.eqsin.wmnet - jmm@cumin2002" * 10:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92179 and previous config saved to /var/cache/conftool/dbconfig/20260504-102715-fceratto.json * 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox * 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum5003.eqsin.wmnet * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1178 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92178 and previous config saved to /var/cache/conftool/dbconfig/20260504-101855-fceratto.json * 10:18 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92177 and previous config saved to /var/cache/conftool/dbconfig/20260504-101826-fceratto.json * 10:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2187: repool after maintenance * 10:16 marostegui@cumin1003: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:15 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: host reimage * 10:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92174 and previous config saved to /var/cache/conftool/dbconfig/20260504-100818-fceratto.json * 10:02 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS trixie * 10:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1162: Reimage to Trixie * 10:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1162.eqiad.wmnet with reason: Reimage to Trixie * 09:58 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P92172 and previous config saved to /var/cache/conftool/dbconfig/20260504-095810-fceratto.json * 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5005.wikimedia.org * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92171 and previous config saved to /var/cache/conftool/dbconfig/20260504-094802-fceratto.json * 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5005.wikimedia.org * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92170 and previous config saved to /var/cache/conftool/dbconfig/20260504-093938-fceratto.json * 09:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92169 and previous config saved to /var/cache/conftool/dbconfig/20260504-093910-fceratto.json * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 09:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1182: after reimage to trixie * 09:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92167 and previous config saved to /var/cache/conftool/dbconfig/20260504-092902-fceratto.json * 09:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P92165 and previous config saved to /var/cache/conftool/dbconfig/20260504-091853-fceratto.json * 09:16 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2187: Fixing events * 09:15 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2187: Fixing events * 09:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2187.codfw.wmnet with reason: Checking events * 09:08 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92163 and previous config saved to /var/cache/conftool/dbconfig/20260504-090845-fceratto.json * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1172 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92161 and previous config saved to /var/cache/conftool/dbconfig/20260504-085930-fceratto.json * 08:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance * 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92160 and previous config saved to /var/cache/conftool/dbconfig/20260504-085912-fceratto.json * 08:56 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1182: after reimage to trixie * 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92158 and previous config saved to /var/cache/conftool/dbconfig/20260504-084904-fceratto.json * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1008.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1007.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1006.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1005.eqiad.wmnet * 08:43 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1004.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1003.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-worker1001.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1002.eqiad.wmnet * 08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: tools-k8s-ctrl1001.eqiad.wmnet * 08:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P92157 and previous config saved to /var/cache/conftool/dbconfig/20260504-083857-fceratto.json * 08:37 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS trixie * 08:32 moritzm: installing Linux 5.10.251-3 on bullseye hosts * 08:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92156 and previous config saved to /var/cache/conftool/dbconfig/20260504-082849-fceratto.json * 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet * 08:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1167 ([[phab:T419961|T419961]])', diff saved to https://phabricator.wikimedia.org/P92155 and previous config saved to /var/cache/conftool/dbconfig/20260504-082024-fceratto.json * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance * 08:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance * 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet * 08:15 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:11 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1182.eqiad.wmnet with reason: host reimage * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 08:04 gkyziridis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync * 08:04 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] (duration: 07m 58s) * 08:03 gkyziridis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: sync * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:02 gkyziridis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: sync * 08:02 gkyziridis@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: sync * 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet * 08:01 moritzm: installing Linux 6.1.170 on bookworm hosts * 07:59 urbanecm@deploy1003: urbanecm, h2o: Continuing with deployment * 07:57 urbanecm@deploy1003: urbanecm, h2o: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:57 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1212: after reimage to trixie * 07:56 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1277256{{!}}Add sva to wmgExtraLanguageNames (T407106)]] * 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet * 07:55 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS trixie * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:51 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:48 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:47 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1182: Reimage to Trixie * 07:47 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1182: Reimage to Trixie * 07:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1182.eqiad.wmnet with reason: Reimage to Trixie * 07:44 dcausse: [[phab:T425301|T425301]]: stopping writes on cloudelastic * 07:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 07:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2147.codfw.wmnet * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:42 marostegui@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:42 marostegui@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2147.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1003" * 07:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2149: after reimage to trixie * 07:40 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1188: after reimage to trixie * 07:38 moritzm: installing Linux 6.12.85 on trixie hosts * 07:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo2003.codfw.wmnet * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo2003.codfw.wmnet * 07:33 marostegui@cumin1003: START - Cookbook sre.dns.netbox * 07:28 marostegui@cumin1003: START - Cookbook sre.hosts.decommission for hosts db2147.codfw.wmnet * 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org * 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org * 07:11 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1212: after reimage to trixie * 07:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1212.eqiad.wmnet with OS trixie * 06:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2149: after reimage to trixie * 06:55 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1188: after reimage to trixie * 06:52 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1188.eqiad.wmnet with OS trixie * 06:47 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2149.codfw.wmnet with OS trixie * 06:43 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: host reimage * 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:25 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:21 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1212.eqiad.wmnet with OS trixie * 06:19 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1188.eqiad.wmnet with reason: host reimage * 06:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: host reimage * 06:11 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1212: Reimage to Trixie * 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1212.eqiad.wmnet with reason: Reimage to Trixie * 06:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 13 hosts with reason: Sanitarium s3 master: reimage to Debian Trixie * 06:09 marostegui: Reimage sanitarium master for s3, lag to be expected on wikireplicas for s3 [[phab:T424792|T424792]] * 06:05 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1188.eqiad.wmnet with OS trixie * 06:02 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1188: Reimage to Trixie * 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1188.eqiad.wmnet with reason: Reimage to Trixie * 05:57 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2149.codfw.wmnet with OS trixie * 05:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2149: Reimage to Trixie * 05:55 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2149: Reimage to Trixie * 05:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2149.codfw.wmnet with reason: Reimage to Trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 36s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-05-03 == * 14:11 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] (duration: 10m 51s) * 14:05 ladsgroup@deploy1003: ladsgroup: Continuing with deployment * 14:04 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1281991{{!}}Disable uploads in scnwiki (T425278)]] * 12:27 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] (duration: 29m 22s) * 11:58 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281963{{!}}Remove Wikinews from installer's default main page]] == 2026-05-02 == * 23:32 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] (duration: 06m 41s) * 23:28 zabe@deploy1003: dreamyjazz, zabe: Continuing with deployment * 23:27 zabe@deploy1003: dreamyjazz, zabe: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:26 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281775{{!}}Uninstall DynamicPageList from wikis it's not used on (T425202)]] * 23:22 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] (duration: 07m 27s) * 23:18 zabe@deploy1003: zabe, dreamyjazz: Continuing with deployment * 23:17 zabe@deploy1003: zabe, dreamyjazz: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:15 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1281739{{!}}Uninstall DynamicPageList from officewiki (T425154)]] * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2014.codfw.wmnet with OS trixie * 18:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb2013.codfw.wmnet with OS trixie * 18:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host rdb2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2369.codfw.wmnet with OS trixie * 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2369.codfw.wmnet with reason: host reimage * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2374.codfw.wmnet with OS trixie * 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2373.codfw.wmnet with OS trixie * 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2372.codfw.wmnet with OS trixie * 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2371.codfw.wmnet with OS trixie * 17:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2370.codfw.wmnet with OS trixie * 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:44 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2374.codfw.wmnet with reason: host reimage * 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2373.codfw.wmnet with reason: host reimage * 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2372.codfw.wmnet with reason: host reimage * 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2371.codfw.wmnet with reason: host reimage * 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2374.codfw.wmnet with OS trixie * 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2373.codfw.wmnet with OS trixie * 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2372.codfw.wmnet with OS trixie * 16:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2370.codfw.wmnet with reason: host reimage * 16:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2371.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2370.codfw.wmnet with OS trixie * 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2369.codfw.wmnet with OS trixie * 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2366.codfw.wmnet with OS trixie * 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2368.codfw.wmnet with OS trixie * 15:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 15:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 15:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 12:02 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] (duration: 13m 06s) * 11:57 samtar@deploy1003: samtar: Continuing with deployment * 11:50 samtar@deploy1003: samtar: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 11:49 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281747{{!}}Watchlist star: Revert popover/dialog changes (T425185)]] * 09:20 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply * 09:19 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2366.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2368.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2367.codfw.wmnet with OS trixie * 02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 31s) * 02:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2368.codfw.wmnet with reason: host reimage * 01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2367.codfw.wmnet with reason: host reimage * 01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2366.codfw.wmnet with reason: host reimage * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2368.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2367.codfw.wmnet with OS trixie * 01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2366.codfw.wmnet with OS trixie * 01:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2364.codfw.wmnet with OS trixie * 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2365.codfw.wmnet with OS trixie * 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2363.codfw.wmnet with OS trixie * 01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2365.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2364.codfw.wmnet with reason: host reimage * 00:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2363.codfw.wmnet with reason: host reimage * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2365.codfw.wmnet with OS trixie * 00:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2364.codfw.wmnet with OS trixie * 00:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2363.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2362.codfw.wmnet with OS trixie * 00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2361.codfw.wmnet with OS trixie * 00:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2360.codfw.wmnet with OS trixie * 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 00:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" == 2026-05-01 == * 23:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2362.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2361.codfw.wmnet with reason: host reimage * 23:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2360.codfw.wmnet with reason: host reimage * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2362.codfw.wmnet with OS trixie * 23:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2361.codfw.wmnet with OS trixie * 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2360.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2357.codfw.wmnet with OS trixie * 23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2359.codfw.wmnet with OS trixie * 23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2358.codfw.wmnet with OS trixie * 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002" * 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 23:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2359.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2357.codfw.wmnet with reason: host reimage * 22:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2358.codfw.wmnet with reason: host reimage * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2359.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2358.codfw.wmnet with OS trixie * 22:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2357.codfw.wmnet with OS trixie * 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2374.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2373.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2372.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2371.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2370.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2369.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2368.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2367.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2366.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2365.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2364.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2363.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2362.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2361.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2360.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2359.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2358.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2357.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2374 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2374 * 21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2373 * 21:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2373 * 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2372 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2372 * 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2371 * 20:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2371 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2370 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2370 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2369 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2369 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2368 * 20:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2368 * 20:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2367 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2367 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2366 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2366 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2365 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2365 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2364 * 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2364 * 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2363 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2363 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2362 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2362 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2361 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2361 * 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2360 * 20:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2360 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2359 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2359 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2358 * 20:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2358 * 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2357 * 20:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2357 * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 20:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2357 to codfw - jhancock@cumin2002" * 20:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 20:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS trixie * 20:06 krinkle@deploy1003: Finished scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] (duration: 15m 27s) * 20:02 krinkle@deploy1003: krinkle: Continuing with deployment * 19:54 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:52 krinkle@deploy1003: krinkle: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 19:51 krinkle@deploy1003: Started scap sync-world: Backport for [[gerrit:1269440{{!}}Enable wgTrackMediaRequestProvenance on wikidata.org (T414338)]], [[gerrit:1269441{{!}}Enable wgTrackMediaRequestProvenance on Commons (T414338)]] * 19:49 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage * 19:40 dancy@deploy1003: Finished scap sync-world: testing [[phab:T317405|T317405]] (duration: 03m 23s) * 19:37 dancy@deploy1003: Started scap sync-world: testing [[phab:T317405|T317405]] * 19:36 dancy@deploy1003: Installation of scap version "4.259.0" completed for 2 hosts * 19:34 dancy@deploy1003: Installing scap version "4.259.0" for 2 host(s) * 18:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'. * 18:55 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'. * 18:43 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alangi Derick out of all services on: 2442 hosts * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2002 * 18:41 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2002 * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2002.codfw.wmnet 50.16.192.10.in-addr.arpa 0.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2002 - herron@cumin1003" * 18:36 herron@cumin1003: START - Cookbook sre.dns.netbox * 18:33 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2002 * 18:32 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS trixie * 18:26 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS trixie * 18:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 18:00 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2003 * 17:41 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2003 * 17:40 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2003 * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2003.codfw.wmnet 24.32.192.10.in-addr.arpa 4.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:40 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:40 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2003 - herron@cumin1003" * 17:33 herron@cumin1003: START - Cookbook sre.dns.netbox * 17:28 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2003 * 17:28 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS trixie * 17:15 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS trixie * 16:34 cdobbins@cumin2002: conftool action : get/pooled; selector: name=cp5024.eqsin.wmnet * 16:30 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply * 16:30 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply * 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet * 16:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet * 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet * 15:59 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 15:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet * 15:47 dancy@deploy1003: Installation of scap version "4.258.1" completed for 2 hosts * 15:45 dancy@deploy1003: Installing scap version "4.258.1" for 2 host(s) * 15:34 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:30 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host kafka-logging2004 * 15:14 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004 * 15:11 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004 * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: START - Cookbook sre.dns.wipe-cache kafka-logging2004.codfw.wmnet 38.16.192.10.in-addr.arpa 8.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:11 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:11 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host kafka-logging2004 - herron@cumin1003" * 15:05 dancy@deploy1003: Installation of scap version "4.258.0" completed for 2 hosts * 15:03 dancy@deploy1003: Installing scap version "4.258.0" for 2 host(s) * 14:57 herron@cumin1003: START - Cookbook sre.dns.netbox * 14:47 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host kafka-logging2004 * 14:47 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS trixie * 13:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 13:44 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 13:24 _Gerges: WikiMonitor setup * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1080 * 13:09 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1078 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1079 * 13:09 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1077 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1080 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1079 * 13:09 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1078 * 13:08 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1077 * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1079.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1078.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1077.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host cloudvirt1080.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1077 to eqiad - jclark@cumin1003" * 13:00 jclark@cumin1003: START - Cookbook sre.dns.netbox * 12:34 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply * 12:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply * 12:33 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply * 09:57 samtar@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] (duration: 06m 49s) * 09:53 samtar@deploy1003: samtar: Continuing with deployment * 09:52 samtar@deploy1003: samtar: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 09:50 samtar@deploy1003: Started scap sync-world: Backport for [[gerrit:1281423{{!}}Switch watchstar from Popover to Dialog (T417847)]] * 09:38 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] (duration: 06m 05s) * 09:32 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1281426{{!}}Update the interwiki cache (T239173)]] * 08:13 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 08:13 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 08:12 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 03:26 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-feature-counts-change-enrich: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 41s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:16 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] (duration: 07m 05s) * 00:13 zabe@deploy1003: zabe: Continuing with deployment * 00:11 zabe@deploy1003: zabe: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:09 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1280417{{!}}Add script to fix fr_deleted drifts (T424553)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 0gzexaukbqax4i4aj5bndf2yptdhwe6 Release Engineering/SAL 0 17290 2414257 2413351 2026-05-15T13:02:37Z Stashbot 7414 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1287828 T426392 2414257 wikitext text/x-wiki === 2026-05-15 === * 13:02 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1287828 [[phab:T426392|T426392]] === 2026-05-13 === * 12:42 James_F: Zuul: [mediawiki/extensions/Springboard] Add AdminLinks Phan dependency * 12:42 James_F: Zuul: [mediawiki/extensions/ChatBot] Add dependencies on VisualEditor and BlueSpiceFoundation * 12:42 James_F: Zuul: [mediawiki/extensions/ChatIntegration] Add dependency on VisualEditor * 12:37 James_F: Zuul: [mediawiki/extensions/WikiLambda] Drop AF and SB deps down to phan-only, for [[phab:T423180|T423180]] === 2026-05-12 === * 20:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/104 ([[phab:T424774|T424774]]) * 18:08 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add AF and SB deps for [[phab:T423180|T423180]] * 14:18 atsukoito: PrivateSettings: empty $wgOpensearchCredentials for opensearch-on-k8s synced to deploy04 by Reedy * 13:04 atsukoito: PrivateSettings: credentials for opensearch-on-k8s ttmserver-test * 11:50 James_F: Zuul: [machinelearning/liftwing/inference-services] Add qwen36 llm model CI/CD pipelines, for [[phab:T425680|T425680]] * 11:46 James_F: Zuul: Add experimental php-pie-build* jobs to other PHP extensions, for [[phab:T425943|T425943]] * 11:37 James_F: Zuul: [mediawiki/php/wikidiff2] Add experimental php-pie-build* jobs, for [[phab:T425943|T425943]] * 10:05 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 # fix failure seen in quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 7817 * 08:44 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51633 === 2026-05-11 === * 18:28 James_F: Docker: Add changes to php-compile images for PIE, for [[phab:T425943|T425943]] * 16:06 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51439 and 51452 === 2026-05-09 === * 20:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1285498 === 2026-05-07 === * 22:53 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/105 === 2026-05-06 === * 18:13 bd808: Unblock 88.165.192.0/19 * 18:03 bd808: Unblock 94.208.0.0/14 * 17:56 bd808: Unblock 84.226.0.0/16 * 17:41 bd808: Unblock 94.34.0.0/16 * 17:35 bd808: Unblock 109.134.0.0/16 === 2026-05-05 === * 21:20 James_F: Zuul: Provide Node 26 experimental jobs everywhere needed * 21:04 James_F: Docker: Provide initial Node 26 images * 19:01 James_F: Zuul: [mediawiki/extensions/PageAssessments] Add Scribunto dependency, for [[phab:T396135|T396135]] * 14:58 dancy: rm /var/log/<nowiki>{</nowiki>user.log.1,syslog.1,messages.1<nowiki>}</nowiki> on deployment-eventgate-4.deployment- prep ([[phab:T425429|T425429]]) === 2026-05-04 === * 15:19 dancy: Upgrading gitlab cloud runners (prod) from 1.35.1-do.3 to 1.35.1-do.5 * 14:51 dancy: Upgrading gitlab cloud runners (staging) from 1.35.1-do.3 to 1.35.1-do.5 * 10:40 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for bluespice template === 2026-05-02 === * 20:49 James_F: Zuul: [mediawiki/core] Enforce PHP 8.4 & 8.5 on release branches, all pass * 19:27 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for MW release branches * 19:19 James_F: Zuul: [mediawiki/extensions/BlogPage] Add dependencies * 16:48 James_F: Hard-restarting Zuul to clear the huge number of i18n updates being re-submitted. * 15:48 James_F: Zuul: [wikimedia-cz/*] Test in PHP 8.3+, dropping 8.2 * 14:02 TheresNoTime: Add bvibber to deployment-prep project * 09:08 James_F: Docker: [quibble-*] Add php-luasandbox so we can test both modes in Scribunto === 2026-05-01 === * 15:42 James_F: Zuul: [wikimedia/lucene-explain-parser] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [wikimedia/textcat] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [mediawiki/tools/ParseWiki] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: zuul: Add ToprakM to CI allowlist * 15:19 James_F: Zuul: [translatewiki] Test in PHP 8.3+, dropping 8.2 * 15:10 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add TestKitchen as a dependency, for [[phab:T425076|T425076]] * 12:40 James_F: Zuul: [mediawiki/tools/code-utils] Test in PHP 8.3+, dropping 8.2 * 08:02 James_F: Zuul: Update xtex's e-mail in the allowlist * 07:37 James_F: Zuul: Switch release branches' selenium jobs to PHP 8.3 * 07:33 James_F: Zuul: Test Wikimedia production libraries in PHP 8.3+, dropping 8.2 === 2026-04-30 === * 21:36 brennen: gitlab-webhooks: building & restarting to deploy https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/40 * 20:26 James_F: Zuul: [mediawiki/tools/api-testing] Make PHP 8.5 CI voting * 20:16 James_F: jforrester@doc1004:~$ # sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn/ # [[phab:T415832|T415832]] * 20:14 James_F: Zuul: [mediawiki/extensions/WebAuthn] Archive, for [[phab:T415832|T415832]] / [[phab:T303495|T303495]] * 17:16 brennen: wikibugs: most maintainers at hackathon, so go release-engineering added as a maintainer while looking to debug error at https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/jobs/810904 * 15:19 mutante: upgrading zuul to 14.2.0-1 on "new zuul" machines ([[phab:T424879|T424879]]) === 2026-04-29 === * 15:49 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add ConfirmEdit dependency, for [[phab:T424597|T424597]] * 15:36 James_F: Zuul: Drop experimental node22 jobs, never used in practice * 15:28 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1279392, https://gerrit.wikimedia.org/r/1279397 === 2026-04-28 === * 18:11 bd808: Unblock 86.0.0.0/16 * 17:41 bd808: Unblock 79.192.0.0/10 * 17:07 James_F: Zuul: [mediawiki/tools/phpunit-patch-coverage] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/minus-x] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/codesniffer] Drop PHP 8.2 testing * 16:32 James_F: Zuul: [mediawiki/services/jobrunner] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [mediawiki/tools/phan] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [oojs/ui] Drop PHP 8.2 testing * 13:14 James_F: Zuul: [mediawiki/tools/phan/SecurityCheckPlugin] Drop PHP 8.2 CI * 10:40 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-rundoc/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud to fix failure seen in mwext-node24-rundoc #1717 * 00:03 bd808: Increase parallelism for wmf-beta-update-databases.py ([[phab:T256168|T256168]]) === 2026-04-27 === * 22:11 bd808: Beta Cluster MediaWiki update logs now available via https://beta-update.wmcloud.org/ ([[phab:T256168|T256168]]) * 21:57 bd808: Add web security group to deployment-deploy04 ([[phab:T256168|T256168]]) * 20:45 James_F: Zuul: Restrict mw*-codehealth-patch jobs to master only, for [[phab:T424573|T424573]] * 17:16 James_F: Docker: [mediawiki-phan-taint-check-demo] Re-platform to Trixie and so PHP 8.4 * 15:53 James_F: Zuul: [mediawiki/extensions/ReportIncident] Add TestKitchen phan dependency, for [[phab:T424220|T424220]] * 14:32 James_F: Zuul: Drop PHP 8.2 enforcement from MediaWiki things for master and REL1_46 for [[phab:T358667|T358667]] * 12:38 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-docs-publish # fix failure seen in mwext-node24-docs-publish 383 * 09:18 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover/mediawiki-libs-node-cssjanus/ # [[phab:T424419|T424419]] === 2026-04-26 === * 20:49 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276777 === 2026-04-24 === * 22:48 dduvall: merged zuul3 branch of integration/config into master and pushed (in preparation for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1277198) * 12:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276428 === 2026-04-23 === * 23:57 bd808: Set `profile::beta::autoupdater::run_updater: true` for deployment-deploy04 via Horizon ([[phab:T256168|T256168]]) * 22:58 bd808: bd808@deployment-deploy04 `sudo -u jenkins-deploy /usr/local/bin/wmf-beta-update-all` * 22:36 bd808: bd808@deployment-deploy04 `sudo -u mwdeploy /usr/local/bin/wmf-beta-update-all` * 22:16 bd808: Disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:12 bd808: Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:02 bd808: Cherry-picked {{gerrit|1276813}} to deployment-puppetserver-1 ([[phab:T256168|T256168]]) * 20:11 James_F: Zuul: [wikibase/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [wikidata/query/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [analytics/*] Replace CI testing in Node 20 with Node 24 * 20:10 James_F: Zuul: [mediawiki/tools/*] Replace CI testing in Node 20 with Node 24 * 20:06 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:55 James_F: Zuul: [jquery-client] Replace CI testing in Node 20 with Node 24 * 19:51 James_F: Zuul: [wikipeg] Drop testing in Node 20 and Node 22 * 19:47 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:37 James_F: Zuul: [oojs/ui] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [oojs/js] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [unicodejs] Replace CI testing in Node 20 with Node 24 * 19:36 James_F: Zuul: [wikimedia/portals] Drop CI testing in Node 20 and Node 22 * 18:57 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:39 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:18 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 17:58 James_F: Zuul: [mediawiki/extensions/OAuth] Add dependency on CentralAuth, for [[phab:T415281|T415281]] * 17:56 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.13-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 16:35 James_F: Zuul: Enforce PHP 8.5 CI for MW things in master (and REL1_46), for [[phab:T411814|T411814]] * 16:19 James_F: Zuul: [mediawiki/services/parsoid] Enable PHP 8.5 CI * 15:47 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add AntiSpoof dependency, for [[phab:T420548|T420548]] * 14:20 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24 # fix failure seen in mediawiki-node24 8385 and 8405 * 12:56 James_F: Zuul: [mediawiki/extensions/GrowthExperiments] Add CentralNotice dependency, for [[phab:T422082|T422082]] === 2026-04-22 === * 00:07 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add MF dependency, for [[phab:T424113|T424113]] === 2026-04-21 === * 23:26 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration dep too, for [[phab:T394410|T394410]] * 23:17 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add standalone test jobs, for [[phab:T422031|T422031]] * 20:47 inflatador: updating cirrussearch hosts to Trixie/OpenSearch 2 [[phab:T421763|T421763]] * 20:38 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration phan dep, for [[phab:T394410|T394410]] * 20:17 bd808: Running tofu for [[phab:T421244|T421244]] * 18:00 James_F: Zuul: [mediawiki/extensions/WatchAnalytics] Add ApprovedRevs Phan dependency * 16:35 bd808: Unblock 79.116.0.0/16 * 13:34 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add TestKitchen phan dep, for [[phab:T415254|T415254]] * 13:27 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add CentralAuth dependency, for [[phab:T420548|T420548]] === 2026-04-20 === * 23:56 bd808: Unblock 76.157.0.0/16 * 18:28 dancy: Upgrading gitlab cloud runners (staging) to 1.33.9-do.2 ([[phab:T423726|T423726]]) * 18:28 dancy: Upgrading gitlab cloud runners (staging) ([[phab:T423726|T423726]]) * 18:19 James_F: jjb: All 486 (!) jobs now updated for [[phab:T423622|T423622]] * 18:18 bd808: Unblock 113.128.0.0/15 * 15:03 James_F: Docker: Bump ci-bullseye/-bookworm/-trixie for mirrors.wm.org removal, [[phab:T423622|T423622]] === 2026-04-19 === * 19:53 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272752 === 2026-04-17 === * 21:07 thcipriani: marking integration-agent-1080 offline for experimentation * 19:30 thcipriani: reconfiguring castor-save-workspace-cache with https://gerrit.wikimedia.org/r/1273935 * 17:47 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) * 16:49 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) === 2026-04-16 === * 20:49 dduvall: creating integration/zuul-jobs repo to serve as a mirror of opendev.org/zuul/zuul-jobs ([[phab:T406384|T406384]]) * 13:38 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272711 [[phab:T423568|T423568]] * 11:07 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud === 2026-04-15 === * 20:05 James_F: Zuul: Configure REL1_46 CI, for [[phab:T423257|T423257]] * 17:44 bd808: Unblock 176.0.0.0/13 * 17:39 bd808: Unblock 46.128.0.0/16 * 17:32 bd808: Unblock 176.86.0.0/16 * 16:39 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/127d783b2176ac60b646a5fa4f1b1a872ca66340 * 15:33 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/100 * 01:02 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/99 === 2026-04-14 === * 20:42 James_F: Docker: [composer-scratch] Upgrade composer to 2.9.7 and cascade * 16:35 bd808: Unblock 88.112.0.0/14 * 00:48 bd808: Unblock 24.6.0.0/16 * 00:42 bd808: Unblock 152.231.48.0/20 === 2026-04-13 === * 22:00 James_F: Zuul: [mediawiki/vendor] Drop accidental Wikibase browser tests on branches * 20:28 James_F: Zuul: [mediawiki/extensions/Chart] Drop Doxygen publish job, not used * 14:42 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add FlaggedRevs dep, for [[phab:T421011|T421011]] === 2026-04-12 === * 18:21 James_F: jforrester@contint1002:~$ sudo /usr/sbin/service zuul restart && tail -f -n100 /var/log/zuul/zuul.log # [[phab:T423027|T423027]] === 2026-04-10 === * 23:22 James_F: jforrester@contint1002:~$ zuul enqueue --trigger gerrit --pipeline postmerge --project mediawiki/extensions/ReadingLists --change {{Gerrit|1269498}},2 # [[phab:T422976|T422976]] * 23:20 James_F: Zuul: [mediawiki/extensions/ReadingLists] Publish JS coverage, for [[phab:T422976|T422976]] * 23:13 James_F: Zuul: Migrate a few straggler Node 20 MediaWiki things to Node 24 * 23:01 James_F: Zuul: Move all MediaWiki things from mediawiki-node20 to mediawiki-node24 * 21:59 James_F: Docker: Bump Node base images to March releases and cascade; Upgrade Quibble images from Node 20 to Node 24 * 10:24 hashar: Updating all Quibble jobs to 1.17.1 * 10:22 hashar: Updated PostgreSQL jobs to Quibble 1.17.1 # [[phab:T422110|T422110]] * 10:22 hashar: Updated apitesting job to Quibble 1.17.1 # [[phab:T422843|T422843]] [[phab:T418743|T418743]] * 09:51 hashar: Tag Quibble 1.17.1 @ {{Gerrit|0a1ab3b7c3dfee36c9bc2e9b049957d94e190e85}} === 2026-04-09 === * 15:13 hashar: Rolling back Quibble jobs to 1.16.0 (api-testing stage fails due to missing npm install step` * 14:58 hashar: Upgrading Quibble jobs to 1.17.0 * 14:23 hashar: Tagged Quibble 1.17.0 @ {{Gerrit|864381c6b63bdbcd8c74a3162c406fffcaaf8694}} * 07:48 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1268559 "Zuul: use standalone jobs for GrowthExperiments Cypress tests" {{!}} [[phab:T417412|T417412]] === 2026-04-08 === * 22:19 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1269068 * 22:01 bd808: Unblock 95.216.12.170/32 ([[phab:T422751|T422751]]) * 19:26 brennen: gitlab-webhooks: building & deploying https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/37 - hitting some build tooling stuff, trying a fix per instructions in the error log * 17:54 bd808: Unblock 167.56.0.0/13 ([[phab:T422721|T422721]]) * 06:31 hashar: Deleted integration-agent-castor05 Bullseye instance, replaced by integration-agent-castor06 which is on Bookworm # [[phab:T421114|T421114]] * 06:24 hashar: Deleted integration-agent-qemu-1003 Bullseye image, replaced by integration-agent-qemu-1004 which is on Bookworm # [[phab:T422488|T422488]] === 2026-04-07 === * 22:25 dduvall: adding new pipelinelib labels to ci nodes ([[phab:T422234|T422234]]) * 20:05 hashar: Triggered a build of https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/ * 17:06 dduvall: added `Docker` label to `contint` jenkins nodes ([[phab:T422507|T422507]]) * 17:05 dduvall: restored missing `pipelinelib` labels on `integration-agent-docker-` CI hosts ([[phab:T422507|T422507]]) * 16:53 bd808: Unblock 73.0.0.0/8 ([[phab:T422498|T422498]]) * 12:36 hashar: jjb: use $CASTOR_HOST for Quibble success cache. https://gerrit.wikimedia.org/r/1268545 {{!}} This causes the Quibble jobs to use a new instance for the success cache, which is empty # [[phab:T383243|T383243]] [[phab:T421114|T421114]] * 12:17 hashar: Migrated Castor from integration-castor05 to integration-castor06. Updated CASTOR_HOST in Jenkins and moved the Cinder volume to the new instance #ย [[phab:T421114|T421114]] * 11:14 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1090, 1091, 1092 and 1093 # [[phab:T421114|T421114]] * 10:09 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1083 to 1089 # [[phab:T421114|T421114]] * 07:23 hashar: CI Jenkins: removed `blubber` label from all agents after having moved PipelineLib to use the `Docker` label {{!}} [[phab:T422234|T422234]] === 2026-04-06 === * 16:01 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1268239 === 2026-04-03 === * 20:17 bd808: Unblock 2.54.0.0/16 ([[phab:T422238|T422238]]) * 17:25 bd808: Unblock 31.18.0.0/16 ([[phab:T422245|T422245]]) * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add โ€ฆConfigManager & โ€ฆUserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: [โ€ฆ/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: [โ€ฆ/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> d1u9fd8cwfi7ixzdd2ed6tp2q6lxw1q 2414283 2414257 2026-05-15T17:59:13Z Stashbot 7414 dancy: Upgrading gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 (T426436) 2414283 wikitext text/x-wiki === 2026-05-15 === * 17:59 dancy: Upgrading gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 13:02 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1287828 [[phab:T426392|T426392]] === 2026-05-13 === * 12:42 James_F: Zuul: [mediawiki/extensions/Springboard] Add AdminLinks Phan dependency * 12:42 James_F: Zuul: [mediawiki/extensions/ChatBot] Add dependencies on VisualEditor and BlueSpiceFoundation * 12:42 James_F: Zuul: [mediawiki/extensions/ChatIntegration] Add dependency on VisualEditor * 12:37 James_F: Zuul: [mediawiki/extensions/WikiLambda] Drop AF and SB deps down to phan-only, for [[phab:T423180|T423180]] === 2026-05-12 === * 20:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/104 ([[phab:T424774|T424774]]) * 18:08 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add AF and SB deps for [[phab:T423180|T423180]] * 14:18 atsukoito: PrivateSettings: empty $wgOpensearchCredentials for opensearch-on-k8s synced to deploy04 by Reedy * 13:04 atsukoito: PrivateSettings: credentials for opensearch-on-k8s ttmserver-test * 11:50 James_F: Zuul: [machinelearning/liftwing/inference-services] Add qwen36 llm model CI/CD pipelines, for [[phab:T425680|T425680]] * 11:46 James_F: Zuul: Add experimental php-pie-build* jobs to other PHP extensions, for [[phab:T425943|T425943]] * 11:37 James_F: Zuul: [mediawiki/php/wikidiff2] Add experimental php-pie-build* jobs, for [[phab:T425943|T425943]] * 10:05 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 # fix failure seen in quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 7817 * 08:44 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51633 === 2026-05-11 === * 18:28 James_F: Docker: Add changes to php-compile images for PIE, for [[phab:T425943|T425943]] * 16:06 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51439 and 51452 === 2026-05-09 === * 20:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1285498 === 2026-05-07 === * 22:53 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/105 === 2026-05-06 === * 18:13 bd808: Unblock 88.165.192.0/19 * 18:03 bd808: Unblock 94.208.0.0/14 * 17:56 bd808: Unblock 84.226.0.0/16 * 17:41 bd808: Unblock 94.34.0.0/16 * 17:35 bd808: Unblock 109.134.0.0/16 === 2026-05-05 === * 21:20 James_F: Zuul: Provide Node 26 experimental jobs everywhere needed * 21:04 James_F: Docker: Provide initial Node 26 images * 19:01 James_F: Zuul: [mediawiki/extensions/PageAssessments] Add Scribunto dependency, for [[phab:T396135|T396135]] * 14:58 dancy: rm /var/log/<nowiki>{</nowiki>user.log.1,syslog.1,messages.1<nowiki>}</nowiki> on deployment-eventgate-4.deployment- prep ([[phab:T425429|T425429]]) === 2026-05-04 === * 15:19 dancy: Upgrading gitlab cloud runners (prod) from 1.35.1-do.3 to 1.35.1-do.5 * 14:51 dancy: Upgrading gitlab cloud runners (staging) from 1.35.1-do.3 to 1.35.1-do.5 * 10:40 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for bluespice template === 2026-05-02 === * 20:49 James_F: Zuul: [mediawiki/core] Enforce PHP 8.4 & 8.5 on release branches, all pass * 19:27 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for MW release branches * 19:19 James_F: Zuul: [mediawiki/extensions/BlogPage] Add dependencies * 16:48 James_F: Hard-restarting Zuul to clear the huge number of i18n updates being re-submitted. * 15:48 James_F: Zuul: [wikimedia-cz/*] Test in PHP 8.3+, dropping 8.2 * 14:02 TheresNoTime: Add bvibber to deployment-prep project * 09:08 James_F: Docker: [quibble-*] Add php-luasandbox so we can test both modes in Scribunto === 2026-05-01 === * 15:42 James_F: Zuul: [wikimedia/lucene-explain-parser] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [wikimedia/textcat] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [mediawiki/tools/ParseWiki] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: zuul: Add ToprakM to CI allowlist * 15:19 James_F: Zuul: [translatewiki] Test in PHP 8.3+, dropping 8.2 * 15:10 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add TestKitchen as a dependency, for [[phab:T425076|T425076]] * 12:40 James_F: Zuul: [mediawiki/tools/code-utils] Test in PHP 8.3+, dropping 8.2 * 08:02 James_F: Zuul: Update xtex's e-mail in the allowlist * 07:37 James_F: Zuul: Switch release branches' selenium jobs to PHP 8.3 * 07:33 James_F: Zuul: Test Wikimedia production libraries in PHP 8.3+, dropping 8.2 === 2026-04-30 === * 21:36 brennen: gitlab-webhooks: building & restarting to deploy https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/40 * 20:26 James_F: Zuul: [mediawiki/tools/api-testing] Make PHP 8.5 CI voting * 20:16 James_F: jforrester@doc1004:~$ # sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn/ # [[phab:T415832|T415832]] * 20:14 James_F: Zuul: [mediawiki/extensions/WebAuthn] Archive, for [[phab:T415832|T415832]] / [[phab:T303495|T303495]] * 17:16 brennen: wikibugs: most maintainers at hackathon, so go release-engineering added as a maintainer while looking to debug error at https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/jobs/810904 * 15:19 mutante: upgrading zuul to 14.2.0-1 on "new zuul" machines ([[phab:T424879|T424879]]) === 2026-04-29 === * 15:49 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add ConfirmEdit dependency, for [[phab:T424597|T424597]] * 15:36 James_F: Zuul: Drop experimental node22 jobs, never used in practice * 15:28 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1279392, https://gerrit.wikimedia.org/r/1279397 === 2026-04-28 === * 18:11 bd808: Unblock 86.0.0.0/16 * 17:41 bd808: Unblock 79.192.0.0/10 * 17:07 James_F: Zuul: [mediawiki/tools/phpunit-patch-coverage] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/minus-x] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/codesniffer] Drop PHP 8.2 testing * 16:32 James_F: Zuul: [mediawiki/services/jobrunner] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [mediawiki/tools/phan] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [oojs/ui] Drop PHP 8.2 testing * 13:14 James_F: Zuul: [mediawiki/tools/phan/SecurityCheckPlugin] Drop PHP 8.2 CI * 10:40 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-rundoc/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud to fix failure seen in mwext-node24-rundoc #1717 * 00:03 bd808: Increase parallelism for wmf-beta-update-databases.py ([[phab:T256168|T256168]]) === 2026-04-27 === * 22:11 bd808: Beta Cluster MediaWiki update logs now available via https://beta-update.wmcloud.org/ ([[phab:T256168|T256168]]) * 21:57 bd808: Add web security group to deployment-deploy04 ([[phab:T256168|T256168]]) * 20:45 James_F: Zuul: Restrict mw*-codehealth-patch jobs to master only, for [[phab:T424573|T424573]] * 17:16 James_F: Docker: [mediawiki-phan-taint-check-demo] Re-platform to Trixie and so PHP 8.4 * 15:53 James_F: Zuul: [mediawiki/extensions/ReportIncident] Add TestKitchen phan dependency, for [[phab:T424220|T424220]] * 14:32 James_F: Zuul: Drop PHP 8.2 enforcement from MediaWiki things for master and REL1_46 for [[phab:T358667|T358667]] * 12:38 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-docs-publish # fix failure seen in mwext-node24-docs-publish 383 * 09:18 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover/mediawiki-libs-node-cssjanus/ # [[phab:T424419|T424419]] === 2026-04-26 === * 20:49 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276777 === 2026-04-24 === * 22:48 dduvall: merged zuul3 branch of integration/config into master and pushed (in preparation for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1277198) * 12:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276428 === 2026-04-23 === * 23:57 bd808: Set `profile::beta::autoupdater::run_updater: true` for deployment-deploy04 via Horizon ([[phab:T256168|T256168]]) * 22:58 bd808: bd808@deployment-deploy04 `sudo -u jenkins-deploy /usr/local/bin/wmf-beta-update-all` * 22:36 bd808: bd808@deployment-deploy04 `sudo -u mwdeploy /usr/local/bin/wmf-beta-update-all` * 22:16 bd808: Disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:12 bd808: Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:02 bd808: Cherry-picked {{gerrit|1276813}} to deployment-puppetserver-1 ([[phab:T256168|T256168]]) * 20:11 James_F: Zuul: [wikibase/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [wikidata/query/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [analytics/*] Replace CI testing in Node 20 with Node 24 * 20:10 James_F: Zuul: [mediawiki/tools/*] Replace CI testing in Node 20 with Node 24 * 20:06 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:55 James_F: Zuul: [jquery-client] Replace CI testing in Node 20 with Node 24 * 19:51 James_F: Zuul: [wikipeg] Drop testing in Node 20 and Node 22 * 19:47 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:37 James_F: Zuul: [oojs/ui] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [oojs/js] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [unicodejs] Replace CI testing in Node 20 with Node 24 * 19:36 James_F: Zuul: [wikimedia/portals] Drop CI testing in Node 20 and Node 22 * 18:57 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:39 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:18 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 17:58 James_F: Zuul: [mediawiki/extensions/OAuth] Add dependency on CentralAuth, for [[phab:T415281|T415281]] * 17:56 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.13-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 16:35 James_F: Zuul: Enforce PHP 8.5 CI for MW things in master (and REL1_46), for [[phab:T411814|T411814]] * 16:19 James_F: Zuul: [mediawiki/services/parsoid] Enable PHP 8.5 CI * 15:47 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add AntiSpoof dependency, for [[phab:T420548|T420548]] * 14:20 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24 # fix failure seen in mediawiki-node24 8385 and 8405 * 12:56 James_F: Zuul: [mediawiki/extensions/GrowthExperiments] Add CentralNotice dependency, for [[phab:T422082|T422082]] === 2026-04-22 === * 00:07 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add MF dependency, for [[phab:T424113|T424113]] === 2026-04-21 === * 23:26 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration dep too, for [[phab:T394410|T394410]] * 23:17 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add standalone test jobs, for [[phab:T422031|T422031]] * 20:47 inflatador: updating cirrussearch hosts to Trixie/OpenSearch 2 [[phab:T421763|T421763]] * 20:38 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration phan dep, for [[phab:T394410|T394410]] * 20:17 bd808: Running tofu for [[phab:T421244|T421244]] * 18:00 James_F: Zuul: [mediawiki/extensions/WatchAnalytics] Add ApprovedRevs Phan dependency * 16:35 bd808: Unblock 79.116.0.0/16 * 13:34 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add TestKitchen phan dep, for [[phab:T415254|T415254]] * 13:27 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add CentralAuth dependency, for [[phab:T420548|T420548]] === 2026-04-20 === * 23:56 bd808: Unblock 76.157.0.0/16 * 18:28 dancy: Upgrading gitlab cloud runners (staging) to 1.33.9-do.2 ([[phab:T423726|T423726]]) * 18:28 dancy: Upgrading gitlab cloud runners (staging) ([[phab:T423726|T423726]]) * 18:19 James_F: jjb: All 486 (!) jobs now updated for [[phab:T423622|T423622]] * 18:18 bd808: Unblock 113.128.0.0/15 * 15:03 James_F: Docker: Bump ci-bullseye/-bookworm/-trixie for mirrors.wm.org removal, [[phab:T423622|T423622]] === 2026-04-19 === * 19:53 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272752 === 2026-04-17 === * 21:07 thcipriani: marking integration-agent-1080 offline for experimentation * 19:30 thcipriani: reconfiguring castor-save-workspace-cache with https://gerrit.wikimedia.org/r/1273935 * 17:47 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) * 16:49 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) === 2026-04-16 === * 20:49 dduvall: creating integration/zuul-jobs repo to serve as a mirror of opendev.org/zuul/zuul-jobs ([[phab:T406384|T406384]]) * 13:38 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272711 [[phab:T423568|T423568]] * 11:07 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud === 2026-04-15 === * 20:05 James_F: Zuul: Configure REL1_46 CI, for [[phab:T423257|T423257]] * 17:44 bd808: Unblock 176.0.0.0/13 * 17:39 bd808: Unblock 46.128.0.0/16 * 17:32 bd808: Unblock 176.86.0.0/16 * 16:39 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/127d783b2176ac60b646a5fa4f1b1a872ca66340 * 15:33 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/100 * 01:02 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/99 === 2026-04-14 === * 20:42 James_F: Docker: [composer-scratch] Upgrade composer to 2.9.7 and cascade * 16:35 bd808: Unblock 88.112.0.0/14 * 00:48 bd808: Unblock 24.6.0.0/16 * 00:42 bd808: Unblock 152.231.48.0/20 === 2026-04-13 === * 22:00 James_F: Zuul: [mediawiki/vendor] Drop accidental Wikibase browser tests on branches * 20:28 James_F: Zuul: [mediawiki/extensions/Chart] Drop Doxygen publish job, not used * 14:42 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add FlaggedRevs dep, for [[phab:T421011|T421011]] === 2026-04-12 === * 18:21 James_F: jforrester@contint1002:~$ sudo /usr/sbin/service zuul restart && tail -f -n100 /var/log/zuul/zuul.log # [[phab:T423027|T423027]] === 2026-04-10 === * 23:22 James_F: jforrester@contint1002:~$ zuul enqueue --trigger gerrit --pipeline postmerge --project mediawiki/extensions/ReadingLists --change {{Gerrit|1269498}},2 # [[phab:T422976|T422976]] * 23:20 James_F: Zuul: [mediawiki/extensions/ReadingLists] Publish JS coverage, for [[phab:T422976|T422976]] * 23:13 James_F: Zuul: Migrate a few straggler Node 20 MediaWiki things to Node 24 * 23:01 James_F: Zuul: Move all MediaWiki things from mediawiki-node20 to mediawiki-node24 * 21:59 James_F: Docker: Bump Node base images to March releases and cascade; Upgrade Quibble images from Node 20 to Node 24 * 10:24 hashar: Updating all Quibble jobs to 1.17.1 * 10:22 hashar: Updated PostgreSQL jobs to Quibble 1.17.1 # [[phab:T422110|T422110]] * 10:22 hashar: Updated apitesting job to Quibble 1.17.1 # [[phab:T422843|T422843]] [[phab:T418743|T418743]] * 09:51 hashar: Tag Quibble 1.17.1 @ {{Gerrit|0a1ab3b7c3dfee36c9bc2e9b049957d94e190e85}} === 2026-04-09 === * 15:13 hashar: Rolling back Quibble jobs to 1.16.0 (api-testing stage fails due to missing npm install step` * 14:58 hashar: Upgrading Quibble jobs to 1.17.0 * 14:23 hashar: Tagged Quibble 1.17.0 @ {{Gerrit|864381c6b63bdbcd8c74a3162c406fffcaaf8694}} * 07:48 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1268559 "Zuul: use standalone jobs for GrowthExperiments Cypress tests" {{!}} [[phab:T417412|T417412]] === 2026-04-08 === * 22:19 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1269068 * 22:01 bd808: Unblock 95.216.12.170/32 ([[phab:T422751|T422751]]) * 19:26 brennen: gitlab-webhooks: building & deploying https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/37 - hitting some build tooling stuff, trying a fix per instructions in the error log * 17:54 bd808: Unblock 167.56.0.0/13 ([[phab:T422721|T422721]]) * 06:31 hashar: Deleted integration-agent-castor05 Bullseye instance, replaced by integration-agent-castor06 which is on Bookworm # [[phab:T421114|T421114]] * 06:24 hashar: Deleted integration-agent-qemu-1003 Bullseye image, replaced by integration-agent-qemu-1004 which is on Bookworm # [[phab:T422488|T422488]] === 2026-04-07 === * 22:25 dduvall: adding new pipelinelib labels to ci nodes ([[phab:T422234|T422234]]) * 20:05 hashar: Triggered a build of https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/ * 17:06 dduvall: added `Docker` label to `contint` jenkins nodes ([[phab:T422507|T422507]]) * 17:05 dduvall: restored missing `pipelinelib` labels on `integration-agent-docker-` CI hosts ([[phab:T422507|T422507]]) * 16:53 bd808: Unblock 73.0.0.0/8 ([[phab:T422498|T422498]]) * 12:36 hashar: jjb: use $CASTOR_HOST for Quibble success cache. https://gerrit.wikimedia.org/r/1268545 {{!}} This causes the Quibble jobs to use a new instance for the success cache, which is empty # [[phab:T383243|T383243]] [[phab:T421114|T421114]] * 12:17 hashar: Migrated Castor from integration-castor05 to integration-castor06. Updated CASTOR_HOST in Jenkins and moved the Cinder volume to the new instance #ย [[phab:T421114|T421114]] * 11:14 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1090, 1091, 1092 and 1093 # [[phab:T421114|T421114]] * 10:09 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1083 to 1089 # [[phab:T421114|T421114]] * 07:23 hashar: CI Jenkins: removed `blubber` label from all agents after having moved PipelineLib to use the `Docker` label {{!}} [[phab:T422234|T422234]] === 2026-04-06 === * 16:01 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1268239 === 2026-04-03 === * 20:17 bd808: Unblock 2.54.0.0/16 ([[phab:T422238|T422238]]) * 17:25 bd808: Unblock 31.18.0.0/16 ([[phab:T422245|T422245]]) * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add โ€ฆConfigManager & โ€ฆUserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: [โ€ฆ/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: [โ€ฆ/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> 4q6m10uwuc8kjk7u34e85pst3cpk8bq 2414284 2414283 2026-05-15T18:11:37Z Stashbot 7414 dancy: Upgraded gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 (T426436) 2414284 wikitext text/x-wiki === 2026-05-15 === * 18:11 dancy: Upgraded gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 17:59 dancy: Upgrading gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 13:02 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1287828 [[phab:T426392|T426392]] === 2026-05-13 === * 12:42 James_F: Zuul: [mediawiki/extensions/Springboard] Add AdminLinks Phan dependency * 12:42 James_F: Zuul: [mediawiki/extensions/ChatBot] Add dependencies on VisualEditor and BlueSpiceFoundation * 12:42 James_F: Zuul: [mediawiki/extensions/ChatIntegration] Add dependency on VisualEditor * 12:37 James_F: Zuul: [mediawiki/extensions/WikiLambda] Drop AF and SB deps down to phan-only, for [[phab:T423180|T423180]] === 2026-05-12 === * 20:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/104 ([[phab:T424774|T424774]]) * 18:08 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add AF and SB deps for [[phab:T423180|T423180]] * 14:18 atsukoito: PrivateSettings: empty $wgOpensearchCredentials for opensearch-on-k8s synced to deploy04 by Reedy * 13:04 atsukoito: PrivateSettings: credentials for opensearch-on-k8s ttmserver-test * 11:50 James_F: Zuul: [machinelearning/liftwing/inference-services] Add qwen36 llm model CI/CD pipelines, for [[phab:T425680|T425680]] * 11:46 James_F: Zuul: Add experimental php-pie-build* jobs to other PHP extensions, for [[phab:T425943|T425943]] * 11:37 James_F: Zuul: [mediawiki/php/wikidiff2] Add experimental php-pie-build* jobs, for [[phab:T425943|T425943]] * 10:05 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 # fix failure seen in quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 7817 * 08:44 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51633 === 2026-05-11 === * 18:28 James_F: Docker: Add changes to php-compile images for PIE, for [[phab:T425943|T425943]] * 16:06 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51439 and 51452 === 2026-05-09 === * 20:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1285498 === 2026-05-07 === * 22:53 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/105 === 2026-05-06 === * 18:13 bd808: Unblock 88.165.192.0/19 * 18:03 bd808: Unblock 94.208.0.0/14 * 17:56 bd808: Unblock 84.226.0.0/16 * 17:41 bd808: Unblock 94.34.0.0/16 * 17:35 bd808: Unblock 109.134.0.0/16 === 2026-05-05 === * 21:20 James_F: Zuul: Provide Node 26 experimental jobs everywhere needed * 21:04 James_F: Docker: Provide initial Node 26 images * 19:01 James_F: Zuul: [mediawiki/extensions/PageAssessments] Add Scribunto dependency, for [[phab:T396135|T396135]] * 14:58 dancy: rm /var/log/<nowiki>{</nowiki>user.log.1,syslog.1,messages.1<nowiki>}</nowiki> on deployment-eventgate-4.deployment- prep ([[phab:T425429|T425429]]) === 2026-05-04 === * 15:19 dancy: Upgrading gitlab cloud runners (prod) from 1.35.1-do.3 to 1.35.1-do.5 * 14:51 dancy: Upgrading gitlab cloud runners (staging) from 1.35.1-do.3 to 1.35.1-do.5 * 10:40 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for bluespice template === 2026-05-02 === * 20:49 James_F: Zuul: [mediawiki/core] Enforce PHP 8.4 & 8.5 on release branches, all pass * 19:27 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for MW release branches * 19:19 James_F: Zuul: [mediawiki/extensions/BlogPage] Add dependencies * 16:48 James_F: Hard-restarting Zuul to clear the huge number of i18n updates being re-submitted. * 15:48 James_F: Zuul: [wikimedia-cz/*] Test in PHP 8.3+, dropping 8.2 * 14:02 TheresNoTime: Add bvibber to deployment-prep project * 09:08 James_F: Docker: [quibble-*] Add php-luasandbox so we can test both modes in Scribunto === 2026-05-01 === * 15:42 James_F: Zuul: [wikimedia/lucene-explain-parser] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [wikimedia/textcat] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [mediawiki/tools/ParseWiki] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: zuul: Add ToprakM to CI allowlist * 15:19 James_F: Zuul: [translatewiki] Test in PHP 8.3+, dropping 8.2 * 15:10 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add TestKitchen as a dependency, for [[phab:T425076|T425076]] * 12:40 James_F: Zuul: [mediawiki/tools/code-utils] Test in PHP 8.3+, dropping 8.2 * 08:02 James_F: Zuul: Update xtex's e-mail in the allowlist * 07:37 James_F: Zuul: Switch release branches' selenium jobs to PHP 8.3 * 07:33 James_F: Zuul: Test Wikimedia production libraries in PHP 8.3+, dropping 8.2 === 2026-04-30 === * 21:36 brennen: gitlab-webhooks: building & restarting to deploy https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/40 * 20:26 James_F: Zuul: [mediawiki/tools/api-testing] Make PHP 8.5 CI voting * 20:16 James_F: jforrester@doc1004:~$ # sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn/ # [[phab:T415832|T415832]] * 20:14 James_F: Zuul: [mediawiki/extensions/WebAuthn] Archive, for [[phab:T415832|T415832]] / [[phab:T303495|T303495]] * 17:16 brennen: wikibugs: most maintainers at hackathon, so go release-engineering added as a maintainer while looking to debug error at https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/jobs/810904 * 15:19 mutante: upgrading zuul to 14.2.0-1 on "new zuul" machines ([[phab:T424879|T424879]]) === 2026-04-29 === * 15:49 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add ConfirmEdit dependency, for [[phab:T424597|T424597]] * 15:36 James_F: Zuul: Drop experimental node22 jobs, never used in practice * 15:28 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1279392, https://gerrit.wikimedia.org/r/1279397 === 2026-04-28 === * 18:11 bd808: Unblock 86.0.0.0/16 * 17:41 bd808: Unblock 79.192.0.0/10 * 17:07 James_F: Zuul: [mediawiki/tools/phpunit-patch-coverage] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/minus-x] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/codesniffer] Drop PHP 8.2 testing * 16:32 James_F: Zuul: [mediawiki/services/jobrunner] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [mediawiki/tools/phan] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [oojs/ui] Drop PHP 8.2 testing * 13:14 James_F: Zuul: [mediawiki/tools/phan/SecurityCheckPlugin] Drop PHP 8.2 CI * 10:40 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-rundoc/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud to fix failure seen in mwext-node24-rundoc #1717 * 00:03 bd808: Increase parallelism for wmf-beta-update-databases.py ([[phab:T256168|T256168]]) === 2026-04-27 === * 22:11 bd808: Beta Cluster MediaWiki update logs now available via https://beta-update.wmcloud.org/ ([[phab:T256168|T256168]]) * 21:57 bd808: Add web security group to deployment-deploy04 ([[phab:T256168|T256168]]) * 20:45 James_F: Zuul: Restrict mw*-codehealth-patch jobs to master only, for [[phab:T424573|T424573]] * 17:16 James_F: Docker: [mediawiki-phan-taint-check-demo] Re-platform to Trixie and so PHP 8.4 * 15:53 James_F: Zuul: [mediawiki/extensions/ReportIncident] Add TestKitchen phan dependency, for [[phab:T424220|T424220]] * 14:32 James_F: Zuul: Drop PHP 8.2 enforcement from MediaWiki things for master and REL1_46 for [[phab:T358667|T358667]] * 12:38 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-docs-publish # fix failure seen in mwext-node24-docs-publish 383 * 09:18 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover/mediawiki-libs-node-cssjanus/ # [[phab:T424419|T424419]] === 2026-04-26 === * 20:49 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276777 === 2026-04-24 === * 22:48 dduvall: merged zuul3 branch of integration/config into master and pushed (in preparation for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1277198) * 12:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276428 === 2026-04-23 === * 23:57 bd808: Set `profile::beta::autoupdater::run_updater: true` for deployment-deploy04 via Horizon ([[phab:T256168|T256168]]) * 22:58 bd808: bd808@deployment-deploy04 `sudo -u jenkins-deploy /usr/local/bin/wmf-beta-update-all` * 22:36 bd808: bd808@deployment-deploy04 `sudo -u mwdeploy /usr/local/bin/wmf-beta-update-all` * 22:16 bd808: Disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:12 bd808: Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:02 bd808: Cherry-picked {{gerrit|1276813}} to deployment-puppetserver-1 ([[phab:T256168|T256168]]) * 20:11 James_F: Zuul: [wikibase/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [wikidata/query/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [analytics/*] Replace CI testing in Node 20 with Node 24 * 20:10 James_F: Zuul: [mediawiki/tools/*] Replace CI testing in Node 20 with Node 24 * 20:06 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:55 James_F: Zuul: [jquery-client] Replace CI testing in Node 20 with Node 24 * 19:51 James_F: Zuul: [wikipeg] Drop testing in Node 20 and Node 22 * 19:47 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:37 James_F: Zuul: [oojs/ui] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [oojs/js] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [unicodejs] Replace CI testing in Node 20 with Node 24 * 19:36 James_F: Zuul: [wikimedia/portals] Drop CI testing in Node 20 and Node 22 * 18:57 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:39 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:18 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 17:58 James_F: Zuul: [mediawiki/extensions/OAuth] Add dependency on CentralAuth, for [[phab:T415281|T415281]] * 17:56 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.13-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 16:35 James_F: Zuul: Enforce PHP 8.5 CI for MW things in master (and REL1_46), for [[phab:T411814|T411814]] * 16:19 James_F: Zuul: [mediawiki/services/parsoid] Enable PHP 8.5 CI * 15:47 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add AntiSpoof dependency, for [[phab:T420548|T420548]] * 14:20 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24 # fix failure seen in mediawiki-node24 8385 and 8405 * 12:56 James_F: Zuul: [mediawiki/extensions/GrowthExperiments] Add CentralNotice dependency, for [[phab:T422082|T422082]] === 2026-04-22 === * 00:07 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add MF dependency, for [[phab:T424113|T424113]] === 2026-04-21 === * 23:26 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration dep too, for [[phab:T394410|T394410]] * 23:17 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add standalone test jobs, for [[phab:T422031|T422031]] * 20:47 inflatador: updating cirrussearch hosts to Trixie/OpenSearch 2 [[phab:T421763|T421763]] * 20:38 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration phan dep, for [[phab:T394410|T394410]] * 20:17 bd808: Running tofu for [[phab:T421244|T421244]] * 18:00 James_F: Zuul: [mediawiki/extensions/WatchAnalytics] Add ApprovedRevs Phan dependency * 16:35 bd808: Unblock 79.116.0.0/16 * 13:34 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add TestKitchen phan dep, for [[phab:T415254|T415254]] * 13:27 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add CentralAuth dependency, for [[phab:T420548|T420548]] === 2026-04-20 === * 23:56 bd808: Unblock 76.157.0.0/16 * 18:28 dancy: Upgrading gitlab cloud runners (staging) to 1.33.9-do.2 ([[phab:T423726|T423726]]) * 18:28 dancy: Upgrading gitlab cloud runners (staging) ([[phab:T423726|T423726]]) * 18:19 James_F: jjb: All 486 (!) jobs now updated for [[phab:T423622|T423622]] * 18:18 bd808: Unblock 113.128.0.0/15 * 15:03 James_F: Docker: Bump ci-bullseye/-bookworm/-trixie for mirrors.wm.org removal, [[phab:T423622|T423622]] === 2026-04-19 === * 19:53 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272752 === 2026-04-17 === * 21:07 thcipriani: marking integration-agent-1080 offline for experimentation * 19:30 thcipriani: reconfiguring castor-save-workspace-cache with https://gerrit.wikimedia.org/r/1273935 * 17:47 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) * 16:49 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) === 2026-04-16 === * 20:49 dduvall: creating integration/zuul-jobs repo to serve as a mirror of opendev.org/zuul/zuul-jobs ([[phab:T406384|T406384]]) * 13:38 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272711 [[phab:T423568|T423568]] * 11:07 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud === 2026-04-15 === * 20:05 James_F: Zuul: Configure REL1_46 CI, for [[phab:T423257|T423257]] * 17:44 bd808: Unblock 176.0.0.0/13 * 17:39 bd808: Unblock 46.128.0.0/16 * 17:32 bd808: Unblock 176.86.0.0/16 * 16:39 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/127d783b2176ac60b646a5fa4f1b1a872ca66340 * 15:33 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/100 * 01:02 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/99 === 2026-04-14 === * 20:42 James_F: Docker: [composer-scratch] Upgrade composer to 2.9.7 and cascade * 16:35 bd808: Unblock 88.112.0.0/14 * 00:48 bd808: Unblock 24.6.0.0/16 * 00:42 bd808: Unblock 152.231.48.0/20 === 2026-04-13 === * 22:00 James_F: Zuul: [mediawiki/vendor] Drop accidental Wikibase browser tests on branches * 20:28 James_F: Zuul: [mediawiki/extensions/Chart] Drop Doxygen publish job, not used * 14:42 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add FlaggedRevs dep, for [[phab:T421011|T421011]] === 2026-04-12 === * 18:21 James_F: jforrester@contint1002:~$ sudo /usr/sbin/service zuul restart && tail -f -n100 /var/log/zuul/zuul.log # [[phab:T423027|T423027]] === 2026-04-10 === * 23:22 James_F: jforrester@contint1002:~$ zuul enqueue --trigger gerrit --pipeline postmerge --project mediawiki/extensions/ReadingLists --change {{Gerrit|1269498}},2 # [[phab:T422976|T422976]] * 23:20 James_F: Zuul: [mediawiki/extensions/ReadingLists] Publish JS coverage, for [[phab:T422976|T422976]] * 23:13 James_F: Zuul: Migrate a few straggler Node 20 MediaWiki things to Node 24 * 23:01 James_F: Zuul: Move all MediaWiki things from mediawiki-node20 to mediawiki-node24 * 21:59 James_F: Docker: Bump Node base images to March releases and cascade; Upgrade Quibble images from Node 20 to Node 24 * 10:24 hashar: Updating all Quibble jobs to 1.17.1 * 10:22 hashar: Updated PostgreSQL jobs to Quibble 1.17.1 # [[phab:T422110|T422110]] * 10:22 hashar: Updated apitesting job to Quibble 1.17.1 # [[phab:T422843|T422843]] [[phab:T418743|T418743]] * 09:51 hashar: Tag Quibble 1.17.1 @ {{Gerrit|0a1ab3b7c3dfee36c9bc2e9b049957d94e190e85}} === 2026-04-09 === * 15:13 hashar: Rolling back Quibble jobs to 1.16.0 (api-testing stage fails due to missing npm install step` * 14:58 hashar: Upgrading Quibble jobs to 1.17.0 * 14:23 hashar: Tagged Quibble 1.17.0 @ {{Gerrit|864381c6b63bdbcd8c74a3162c406fffcaaf8694}} * 07:48 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1268559 "Zuul: use standalone jobs for GrowthExperiments Cypress tests" {{!}} [[phab:T417412|T417412]] === 2026-04-08 === * 22:19 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1269068 * 22:01 bd808: Unblock 95.216.12.170/32 ([[phab:T422751|T422751]]) * 19:26 brennen: gitlab-webhooks: building & deploying https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/37 - hitting some build tooling stuff, trying a fix per instructions in the error log * 17:54 bd808: Unblock 167.56.0.0/13 ([[phab:T422721|T422721]]) * 06:31 hashar: Deleted integration-agent-castor05 Bullseye instance, replaced by integration-agent-castor06 which is on Bookworm # [[phab:T421114|T421114]] * 06:24 hashar: Deleted integration-agent-qemu-1003 Bullseye image, replaced by integration-agent-qemu-1004 which is on Bookworm # [[phab:T422488|T422488]] === 2026-04-07 === * 22:25 dduvall: adding new pipelinelib labels to ci nodes ([[phab:T422234|T422234]]) * 20:05 hashar: Triggered a build of https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/ * 17:06 dduvall: added `Docker` label to `contint` jenkins nodes ([[phab:T422507|T422507]]) * 17:05 dduvall: restored missing `pipelinelib` labels on `integration-agent-docker-` CI hosts ([[phab:T422507|T422507]]) * 16:53 bd808: Unblock 73.0.0.0/8 ([[phab:T422498|T422498]]) * 12:36 hashar: jjb: use $CASTOR_HOST for Quibble success cache. https://gerrit.wikimedia.org/r/1268545 {{!}} This causes the Quibble jobs to use a new instance for the success cache, which is empty # [[phab:T383243|T383243]] [[phab:T421114|T421114]] * 12:17 hashar: Migrated Castor from integration-castor05 to integration-castor06. Updated CASTOR_HOST in Jenkins and moved the Cinder volume to the new instance #ย [[phab:T421114|T421114]] * 11:14 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1090, 1091, 1092 and 1093 # [[phab:T421114|T421114]] * 10:09 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1083 to 1089 # [[phab:T421114|T421114]] * 07:23 hashar: CI Jenkins: removed `blubber` label from all agents after having moved PipelineLib to use the `Docker` label {{!}} [[phab:T422234|T422234]] === 2026-04-06 === * 16:01 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1268239 === 2026-04-03 === * 20:17 bd808: Unblock 2.54.0.0/16 ([[phab:T422238|T422238]]) * 17:25 bd808: Unblock 31.18.0.0/16 ([[phab:T422245|T422245]]) * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add โ€ฆConfigManager & โ€ฆUserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: [โ€ฆ/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: [โ€ฆ/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> 4stlj10kuuwyohugcp62ooyqevqb5bv 2414285 2414284 2026-05-15T18:24:39Z Stashbot 7414 dancy: Upgrading gitlab-cloud-runners (prod) from 1.35.1-do.5 to 1.35.1-do.6 (T426436) 2414285 wikitext text/x-wiki === 2026-05-15 === * 18:24 dancy: Upgrading gitlab-cloud-runners (prod) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 18:11 dancy: Upgraded gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 17:59 dancy: Upgrading gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 13:02 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1287828 [[phab:T426392|T426392]] === 2026-05-13 === * 12:42 James_F: Zuul: [mediawiki/extensions/Springboard] Add AdminLinks Phan dependency * 12:42 James_F: Zuul: [mediawiki/extensions/ChatBot] Add dependencies on VisualEditor and BlueSpiceFoundation * 12:42 James_F: Zuul: [mediawiki/extensions/ChatIntegration] Add dependency on VisualEditor * 12:37 James_F: Zuul: [mediawiki/extensions/WikiLambda] Drop AF and SB deps down to phan-only, for [[phab:T423180|T423180]] === 2026-05-12 === * 20:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/104 ([[phab:T424774|T424774]]) * 18:08 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add AF and SB deps for [[phab:T423180|T423180]] * 14:18 atsukoito: PrivateSettings: empty $wgOpensearchCredentials for opensearch-on-k8s synced to deploy04 by Reedy * 13:04 atsukoito: PrivateSettings: credentials for opensearch-on-k8s ttmserver-test * 11:50 James_F: Zuul: [machinelearning/liftwing/inference-services] Add qwen36 llm model CI/CD pipelines, for [[phab:T425680|T425680]] * 11:46 James_F: Zuul: Add experimental php-pie-build* jobs to other PHP extensions, for [[phab:T425943|T425943]] * 11:37 James_F: Zuul: [mediawiki/php/wikidiff2] Add experimental php-pie-build* jobs, for [[phab:T425943|T425943]] * 10:05 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 # fix failure seen in quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 7817 * 08:44 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51633 === 2026-05-11 === * 18:28 James_F: Docker: Add changes to php-compile images for PIE, for [[phab:T425943|T425943]] * 16:06 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51439 and 51452 === 2026-05-09 === * 20:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1285498 === 2026-05-07 === * 22:53 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/105 === 2026-05-06 === * 18:13 bd808: Unblock 88.165.192.0/19 * 18:03 bd808: Unblock 94.208.0.0/14 * 17:56 bd808: Unblock 84.226.0.0/16 * 17:41 bd808: Unblock 94.34.0.0/16 * 17:35 bd808: Unblock 109.134.0.0/16 === 2026-05-05 === * 21:20 James_F: Zuul: Provide Node 26 experimental jobs everywhere needed * 21:04 James_F: Docker: Provide initial Node 26 images * 19:01 James_F: Zuul: [mediawiki/extensions/PageAssessments] Add Scribunto dependency, for [[phab:T396135|T396135]] * 14:58 dancy: rm /var/log/<nowiki>{</nowiki>user.log.1,syslog.1,messages.1<nowiki>}</nowiki> on deployment-eventgate-4.deployment- prep ([[phab:T425429|T425429]]) === 2026-05-04 === * 15:19 dancy: Upgrading gitlab cloud runners (prod) from 1.35.1-do.3 to 1.35.1-do.5 * 14:51 dancy: Upgrading gitlab cloud runners (staging) from 1.35.1-do.3 to 1.35.1-do.5 * 10:40 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for bluespice template === 2026-05-02 === * 20:49 James_F: Zuul: [mediawiki/core] Enforce PHP 8.4 & 8.5 on release branches, all pass * 19:27 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for MW release branches * 19:19 James_F: Zuul: [mediawiki/extensions/BlogPage] Add dependencies * 16:48 James_F: Hard-restarting Zuul to clear the huge number of i18n updates being re-submitted. * 15:48 James_F: Zuul: [wikimedia-cz/*] Test in PHP 8.3+, dropping 8.2 * 14:02 TheresNoTime: Add bvibber to deployment-prep project * 09:08 James_F: Docker: [quibble-*] Add php-luasandbox so we can test both modes in Scribunto === 2026-05-01 === * 15:42 James_F: Zuul: [wikimedia/lucene-explain-parser] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [wikimedia/textcat] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [mediawiki/tools/ParseWiki] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: zuul: Add ToprakM to CI allowlist * 15:19 James_F: Zuul: [translatewiki] Test in PHP 8.3+, dropping 8.2 * 15:10 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add TestKitchen as a dependency, for [[phab:T425076|T425076]] * 12:40 James_F: Zuul: [mediawiki/tools/code-utils] Test in PHP 8.3+, dropping 8.2 * 08:02 James_F: Zuul: Update xtex's e-mail in the allowlist * 07:37 James_F: Zuul: Switch release branches' selenium jobs to PHP 8.3 * 07:33 James_F: Zuul: Test Wikimedia production libraries in PHP 8.3+, dropping 8.2 === 2026-04-30 === * 21:36 brennen: gitlab-webhooks: building & restarting to deploy https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/40 * 20:26 James_F: Zuul: [mediawiki/tools/api-testing] Make PHP 8.5 CI voting * 20:16 James_F: jforrester@doc1004:~$ # sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn/ # [[phab:T415832|T415832]] * 20:14 James_F: Zuul: [mediawiki/extensions/WebAuthn] Archive, for [[phab:T415832|T415832]] / [[phab:T303495|T303495]] * 17:16 brennen: wikibugs: most maintainers at hackathon, so go release-engineering added as a maintainer while looking to debug error at https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/jobs/810904 * 15:19 mutante: upgrading zuul to 14.2.0-1 on "new zuul" machines ([[phab:T424879|T424879]]) === 2026-04-29 === * 15:49 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add ConfirmEdit dependency, for [[phab:T424597|T424597]] * 15:36 James_F: Zuul: Drop experimental node22 jobs, never used in practice * 15:28 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1279392, https://gerrit.wikimedia.org/r/1279397 === 2026-04-28 === * 18:11 bd808: Unblock 86.0.0.0/16 * 17:41 bd808: Unblock 79.192.0.0/10 * 17:07 James_F: Zuul: [mediawiki/tools/phpunit-patch-coverage] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/minus-x] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/codesniffer] Drop PHP 8.2 testing * 16:32 James_F: Zuul: [mediawiki/services/jobrunner] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [mediawiki/tools/phan] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [oojs/ui] Drop PHP 8.2 testing * 13:14 James_F: Zuul: [mediawiki/tools/phan/SecurityCheckPlugin] Drop PHP 8.2 CI * 10:40 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-rundoc/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud to fix failure seen in mwext-node24-rundoc #1717 * 00:03 bd808: Increase parallelism for wmf-beta-update-databases.py ([[phab:T256168|T256168]]) === 2026-04-27 === * 22:11 bd808: Beta Cluster MediaWiki update logs now available via https://beta-update.wmcloud.org/ ([[phab:T256168|T256168]]) * 21:57 bd808: Add web security group to deployment-deploy04 ([[phab:T256168|T256168]]) * 20:45 James_F: Zuul: Restrict mw*-codehealth-patch jobs to master only, for [[phab:T424573|T424573]] * 17:16 James_F: Docker: [mediawiki-phan-taint-check-demo] Re-platform to Trixie and so PHP 8.4 * 15:53 James_F: Zuul: [mediawiki/extensions/ReportIncident] Add TestKitchen phan dependency, for [[phab:T424220|T424220]] * 14:32 James_F: Zuul: Drop PHP 8.2 enforcement from MediaWiki things for master and REL1_46 for [[phab:T358667|T358667]] * 12:38 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-docs-publish # fix failure seen in mwext-node24-docs-publish 383 * 09:18 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover/mediawiki-libs-node-cssjanus/ # [[phab:T424419|T424419]] === 2026-04-26 === * 20:49 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276777 === 2026-04-24 === * 22:48 dduvall: merged zuul3 branch of integration/config into master and pushed (in preparation for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1277198) * 12:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276428 === 2026-04-23 === * 23:57 bd808: Set `profile::beta::autoupdater::run_updater: true` for deployment-deploy04 via Horizon ([[phab:T256168|T256168]]) * 22:58 bd808: bd808@deployment-deploy04 `sudo -u jenkins-deploy /usr/local/bin/wmf-beta-update-all` * 22:36 bd808: bd808@deployment-deploy04 `sudo -u mwdeploy /usr/local/bin/wmf-beta-update-all` * 22:16 bd808: Disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:12 bd808: Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:02 bd808: Cherry-picked {{gerrit|1276813}} to deployment-puppetserver-1 ([[phab:T256168|T256168]]) * 20:11 James_F: Zuul: [wikibase/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [wikidata/query/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [analytics/*] Replace CI testing in Node 20 with Node 24 * 20:10 James_F: Zuul: [mediawiki/tools/*] Replace CI testing in Node 20 with Node 24 * 20:06 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:55 James_F: Zuul: [jquery-client] Replace CI testing in Node 20 with Node 24 * 19:51 James_F: Zuul: [wikipeg] Drop testing in Node 20 and Node 22 * 19:47 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:37 James_F: Zuul: [oojs/ui] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [oojs/js] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [unicodejs] Replace CI testing in Node 20 with Node 24 * 19:36 James_F: Zuul: [wikimedia/portals] Drop CI testing in Node 20 and Node 22 * 18:57 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:39 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:18 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 17:58 James_F: Zuul: [mediawiki/extensions/OAuth] Add dependency on CentralAuth, for [[phab:T415281|T415281]] * 17:56 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.13-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 16:35 James_F: Zuul: Enforce PHP 8.5 CI for MW things in master (and REL1_46), for [[phab:T411814|T411814]] * 16:19 James_F: Zuul: [mediawiki/services/parsoid] Enable PHP 8.5 CI * 15:47 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add AntiSpoof dependency, for [[phab:T420548|T420548]] * 14:20 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24 # fix failure seen in mediawiki-node24 8385 and 8405 * 12:56 James_F: Zuul: [mediawiki/extensions/GrowthExperiments] Add CentralNotice dependency, for [[phab:T422082|T422082]] === 2026-04-22 === * 00:07 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add MF dependency, for [[phab:T424113|T424113]] === 2026-04-21 === * 23:26 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration dep too, for [[phab:T394410|T394410]] * 23:17 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add standalone test jobs, for [[phab:T422031|T422031]] * 20:47 inflatador: updating cirrussearch hosts to Trixie/OpenSearch 2 [[phab:T421763|T421763]] * 20:38 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration phan dep, for [[phab:T394410|T394410]] * 20:17 bd808: Running tofu for [[phab:T421244|T421244]] * 18:00 James_F: Zuul: [mediawiki/extensions/WatchAnalytics] Add ApprovedRevs Phan dependency * 16:35 bd808: Unblock 79.116.0.0/16 * 13:34 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add TestKitchen phan dep, for [[phab:T415254|T415254]] * 13:27 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add CentralAuth dependency, for [[phab:T420548|T420548]] === 2026-04-20 === * 23:56 bd808: Unblock 76.157.0.0/16 * 18:28 dancy: Upgrading gitlab cloud runners (staging) to 1.33.9-do.2 ([[phab:T423726|T423726]]) * 18:28 dancy: Upgrading gitlab cloud runners (staging) ([[phab:T423726|T423726]]) * 18:19 James_F: jjb: All 486 (!) jobs now updated for [[phab:T423622|T423622]] * 18:18 bd808: Unblock 113.128.0.0/15 * 15:03 James_F: Docker: Bump ci-bullseye/-bookworm/-trixie for mirrors.wm.org removal, [[phab:T423622|T423622]] === 2026-04-19 === * 19:53 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272752 === 2026-04-17 === * 21:07 thcipriani: marking integration-agent-1080 offline for experimentation * 19:30 thcipriani: reconfiguring castor-save-workspace-cache with https://gerrit.wikimedia.org/r/1273935 * 17:47 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) * 16:49 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) === 2026-04-16 === * 20:49 dduvall: creating integration/zuul-jobs repo to serve as a mirror of opendev.org/zuul/zuul-jobs ([[phab:T406384|T406384]]) * 13:38 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272711 [[phab:T423568|T423568]] * 11:07 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud === 2026-04-15 === * 20:05 James_F: Zuul: Configure REL1_46 CI, for [[phab:T423257|T423257]] * 17:44 bd808: Unblock 176.0.0.0/13 * 17:39 bd808: Unblock 46.128.0.0/16 * 17:32 bd808: Unblock 176.86.0.0/16 * 16:39 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/127d783b2176ac60b646a5fa4f1b1a872ca66340 * 15:33 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/100 * 01:02 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/99 === 2026-04-14 === * 20:42 James_F: Docker: [composer-scratch] Upgrade composer to 2.9.7 and cascade * 16:35 bd808: Unblock 88.112.0.0/14 * 00:48 bd808: Unblock 24.6.0.0/16 * 00:42 bd808: Unblock 152.231.48.0/20 === 2026-04-13 === * 22:00 James_F: Zuul: [mediawiki/vendor] Drop accidental Wikibase browser tests on branches * 20:28 James_F: Zuul: [mediawiki/extensions/Chart] Drop Doxygen publish job, not used * 14:42 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add FlaggedRevs dep, for [[phab:T421011|T421011]] === 2026-04-12 === * 18:21 James_F: jforrester@contint1002:~$ sudo /usr/sbin/service zuul restart && tail -f -n100 /var/log/zuul/zuul.log # [[phab:T423027|T423027]] === 2026-04-10 === * 23:22 James_F: jforrester@contint1002:~$ zuul enqueue --trigger gerrit --pipeline postmerge --project mediawiki/extensions/ReadingLists --change {{Gerrit|1269498}},2 # [[phab:T422976|T422976]] * 23:20 James_F: Zuul: [mediawiki/extensions/ReadingLists] Publish JS coverage, for [[phab:T422976|T422976]] * 23:13 James_F: Zuul: Migrate a few straggler Node 20 MediaWiki things to Node 24 * 23:01 James_F: Zuul: Move all MediaWiki things from mediawiki-node20 to mediawiki-node24 * 21:59 James_F: Docker: Bump Node base images to March releases and cascade; Upgrade Quibble images from Node 20 to Node 24 * 10:24 hashar: Updating all Quibble jobs to 1.17.1 * 10:22 hashar: Updated PostgreSQL jobs to Quibble 1.17.1 # [[phab:T422110|T422110]] * 10:22 hashar: Updated apitesting job to Quibble 1.17.1 # [[phab:T422843|T422843]] [[phab:T418743|T418743]] * 09:51 hashar: Tag Quibble 1.17.1 @ {{Gerrit|0a1ab3b7c3dfee36c9bc2e9b049957d94e190e85}} === 2026-04-09 === * 15:13 hashar: Rolling back Quibble jobs to 1.16.0 (api-testing stage fails due to missing npm install step` * 14:58 hashar: Upgrading Quibble jobs to 1.17.0 * 14:23 hashar: Tagged Quibble 1.17.0 @ {{Gerrit|864381c6b63bdbcd8c74a3162c406fffcaaf8694}} * 07:48 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1268559 "Zuul: use standalone jobs for GrowthExperiments Cypress tests" {{!}} [[phab:T417412|T417412]] === 2026-04-08 === * 22:19 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1269068 * 22:01 bd808: Unblock 95.216.12.170/32 ([[phab:T422751|T422751]]) * 19:26 brennen: gitlab-webhooks: building & deploying https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/37 - hitting some build tooling stuff, trying a fix per instructions in the error log * 17:54 bd808: Unblock 167.56.0.0/13 ([[phab:T422721|T422721]]) * 06:31 hashar: Deleted integration-agent-castor05 Bullseye instance, replaced by integration-agent-castor06 which is on Bookworm # [[phab:T421114|T421114]] * 06:24 hashar: Deleted integration-agent-qemu-1003 Bullseye image, replaced by integration-agent-qemu-1004 which is on Bookworm # [[phab:T422488|T422488]] === 2026-04-07 === * 22:25 dduvall: adding new pipelinelib labels to ci nodes ([[phab:T422234|T422234]]) * 20:05 hashar: Triggered a build of https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/ * 17:06 dduvall: added `Docker` label to `contint` jenkins nodes ([[phab:T422507|T422507]]) * 17:05 dduvall: restored missing `pipelinelib` labels on `integration-agent-docker-` CI hosts ([[phab:T422507|T422507]]) * 16:53 bd808: Unblock 73.0.0.0/8 ([[phab:T422498|T422498]]) * 12:36 hashar: jjb: use $CASTOR_HOST for Quibble success cache. https://gerrit.wikimedia.org/r/1268545 {{!}} This causes the Quibble jobs to use a new instance for the success cache, which is empty # [[phab:T383243|T383243]] [[phab:T421114|T421114]] * 12:17 hashar: Migrated Castor from integration-castor05 to integration-castor06. Updated CASTOR_HOST in Jenkins and moved the Cinder volume to the new instance #ย [[phab:T421114|T421114]] * 11:14 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1090, 1091, 1092 and 1093 # [[phab:T421114|T421114]] * 10:09 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1083 to 1089 # [[phab:T421114|T421114]] * 07:23 hashar: CI Jenkins: removed `blubber` label from all agents after having moved PipelineLib to use the `Docker` label {{!}} [[phab:T422234|T422234]] === 2026-04-06 === * 16:01 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1268239 === 2026-04-03 === * 20:17 bd808: Unblock 2.54.0.0/16 ([[phab:T422238|T422238]]) * 17:25 bd808: Unblock 31.18.0.0/16 ([[phab:T422245|T422245]]) * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add โ€ฆConfigManager & โ€ฆUserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: [โ€ฆ/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: [โ€ฆ/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> 1b9v15lt4b7x5gx78j2d2gryis0m98n 2414286 2414285 2026-05-15T18:36:11Z Stashbot 7414 dancy: Upgraded gitlab-cloud-runners (prod) from 1.35.1-do.5 to 1.35.1-do.6 (T426436) 2414286 wikitext text/x-wiki === 2026-05-15 === * 18:36 dancy: Upgraded gitlab-cloud-runners (prod) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 18:24 dancy: Upgrading gitlab-cloud-runners (prod) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 18:11 dancy: Upgraded gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 17:59 dancy: Upgrading gitlab-cloud-runners (staging) from 1.35.1-do.5 to 1.35.1-do.6 ([[phab:T426436|T426436]]) * 13:02 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1287828 [[phab:T426392|T426392]] === 2026-05-13 === * 12:42 James_F: Zuul: [mediawiki/extensions/Springboard] Add AdminLinks Phan dependency * 12:42 James_F: Zuul: [mediawiki/extensions/ChatBot] Add dependencies on VisualEditor and BlueSpiceFoundation * 12:42 James_F: Zuul: [mediawiki/extensions/ChatIntegration] Add dependency on VisualEditor * 12:37 James_F: Zuul: [mediawiki/extensions/WikiLambda] Drop AF and SB deps down to phan-only, for [[phab:T423180|T423180]] === 2026-05-12 === * 20:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/104 ([[phab:T424774|T424774]]) * 18:08 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add AF and SB deps for [[phab:T423180|T423180]] * 14:18 atsukoito: PrivateSettings: empty $wgOpensearchCredentials for opensearch-on-k8s synced to deploy04 by Reedy * 13:04 atsukoito: PrivateSettings: credentials for opensearch-on-k8s ttmserver-test * 11:50 James_F: Zuul: [machinelearning/liftwing/inference-services] Add qwen36 llm model CI/CD pipelines, for [[phab:T425680|T425680]] * 11:46 James_F: Zuul: Add experimental php-pie-build* jobs to other PHP extensions, for [[phab:T425943|T425943]] * 11:37 James_F: Zuul: [mediawiki/php/wikidiff2] Add experimental php-pie-build* jobs, for [[phab:T425943|T425943]] * 10:05 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 # fix failure seen in quibble-with-Wikibase-extensions-browser-tests-only-vendor-php83 7817 * 08:44 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51633 === 2026-05-11 === * 18:28 James_F: Docker: Add changes to php-compile images for PIE, for [[phab:T425943|T425943]] * 16:06 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/ # broken Cypress cache? hopefully fix failure seen in quibble-vendor-mysql-php83-selenium 51439 and 51452 === 2026-05-09 === * 20:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1285498 === 2026-05-07 === * 22:53 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/105 === 2026-05-06 === * 18:13 bd808: Unblock 88.165.192.0/19 * 18:03 bd808: Unblock 94.208.0.0/14 * 17:56 bd808: Unblock 84.226.0.0/16 * 17:41 bd808: Unblock 94.34.0.0/16 * 17:35 bd808: Unblock 109.134.0.0/16 === 2026-05-05 === * 21:20 James_F: Zuul: Provide Node 26 experimental jobs everywhere needed * 21:04 James_F: Docker: Provide initial Node 26 images * 19:01 James_F: Zuul: [mediawiki/extensions/PageAssessments] Add Scribunto dependency, for [[phab:T396135|T396135]] * 14:58 dancy: rm /var/log/<nowiki>{</nowiki>user.log.1,syslog.1,messages.1<nowiki>}</nowiki> on deployment-eventgate-4.deployment- prep ([[phab:T425429|T425429]]) === 2026-05-04 === * 15:19 dancy: Upgrading gitlab cloud runners (prod) from 1.35.1-do.3 to 1.35.1-do.5 * 14:51 dancy: Upgrading gitlab cloud runners (staging) from 1.35.1-do.3 to 1.35.1-do.5 * 10:40 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for bluespice template === 2026-05-02 === * 20:49 James_F: Zuul: [mediawiki/core] Enforce PHP 8.4 & 8.5 on release branches, all pass * 19:27 James_F: Zuul: Provide non-voting PHP 8.4/8.5 Quibble jobs for MW release branches * 19:19 James_F: Zuul: [mediawiki/extensions/BlogPage] Add dependencies * 16:48 James_F: Hard-restarting Zuul to clear the huge number of i18n updates being re-submitted. * 15:48 James_F: Zuul: [wikimedia-cz/*] Test in PHP 8.3+, dropping 8.2 * 14:02 TheresNoTime: Add bvibber to deployment-prep project * 09:08 James_F: Docker: [quibble-*] Add php-luasandbox so we can test both modes in Scribunto === 2026-05-01 === * 15:42 James_F: Zuul: [wikimedia/lucene-explain-parser] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [wikimedia/textcat] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: Zuul: [mediawiki/tools/ParseWiki] Test in PHP 8.3+, dropping 8.2 * 15:42 James_F: zuul: Add ToprakM to CI allowlist * 15:19 James_F: Zuul: [translatewiki] Test in PHP 8.3+, dropping 8.2 * 15:10 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add TestKitchen as a dependency, for [[phab:T425076|T425076]] * 12:40 James_F: Zuul: [mediawiki/tools/code-utils] Test in PHP 8.3+, dropping 8.2 * 08:02 James_F: Zuul: Update xtex's e-mail in the allowlist * 07:37 James_F: Zuul: Switch release branches' selenium jobs to PHP 8.3 * 07:33 James_F: Zuul: Test Wikimedia production libraries in PHP 8.3+, dropping 8.2 === 2026-04-30 === * 21:36 brennen: gitlab-webhooks: building & restarting to deploy https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/40 * 20:26 James_F: Zuul: [mediawiki/tools/api-testing] Make PHP 8.5 CI voting * 20:16 James_F: jforrester@doc1004:~$ # sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn/ # [[phab:T415832|T415832]] * 20:14 James_F: Zuul: [mediawiki/extensions/WebAuthn] Archive, for [[phab:T415832|T415832]] / [[phab:T303495|T303495]] * 17:16 brennen: wikibugs: most maintainers at hackathon, so go release-engineering added as a maintainer while looking to debug error at https://gitlab.wikimedia.org/toolforge-repos/wikibugs2/-/jobs/810904 * 15:19 mutante: upgrading zuul to 14.2.0-1 on "new zuul" machines ([[phab:T424879|T424879]]) === 2026-04-29 === * 15:49 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add ConfirmEdit dependency, for [[phab:T424597|T424597]] * 15:36 James_F: Zuul: Drop experimental node22 jobs, never used in practice * 15:28 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1279392, https://gerrit.wikimedia.org/r/1279397 === 2026-04-28 === * 18:11 bd808: Unblock 86.0.0.0/16 * 17:41 bd808: Unblock 79.192.0.0/10 * 17:07 James_F: Zuul: [mediawiki/tools/phpunit-patch-coverage] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/minus-x] Drop PHP 8.2 testing * 17:07 James_F: Zuul: [mediawiki/tools/codesniffer] Drop PHP 8.2 testing * 16:32 James_F: Zuul: [mediawiki/services/jobrunner] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [mediawiki/tools/phan] Drop PHP 8.2 testing * 13:34 James_F: Zuul: [oojs/ui] Drop PHP 8.2 testing * 13:14 James_F: Zuul: [mediawiki/tools/phan/SecurityCheckPlugin] Drop PHP 8.2 CI * 10:40 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-rundoc/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud to fix failure seen in mwext-node24-rundoc #1717 * 00:03 bd808: Increase parallelism for wmf-beta-update-databases.py ([[phab:T256168|T256168]]) === 2026-04-27 === * 22:11 bd808: Beta Cluster MediaWiki update logs now available via https://beta-update.wmcloud.org/ ([[phab:T256168|T256168]]) * 21:57 bd808: Add web security group to deployment-deploy04 ([[phab:T256168|T256168]]) * 20:45 James_F: Zuul: Restrict mw*-codehealth-patch jobs to master only, for [[phab:T424573|T424573]] * 17:16 James_F: Docker: [mediawiki-phan-taint-check-demo] Re-platform to Trixie and so PHP 8.4 * 15:53 James_F: Zuul: [mediawiki/extensions/ReportIncident] Add TestKitchen phan dependency, for [[phab:T424220|T424220]] * 14:32 James_F: Zuul: Drop PHP 8.2 enforcement from MediaWiki things for master and REL1_46 for [[phab:T358667|T358667]] * 12:38 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mwext-node24-docs-publish # fix failure seen in mwext-node24-docs-publish 383 * 09:18 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover/mediawiki-libs-node-cssjanus/ # [[phab:T424419|T424419]] === 2026-04-26 === * 20:49 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276777 === 2026-04-24 === * 22:48 dduvall: merged zuul3 branch of integration/config into master and pushed (in preparation for https://gerrit.wikimedia.org/r/c/operations/puppet/+/1277198) * 12:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1276428 === 2026-04-23 === * 23:57 bd808: Set `profile::beta::autoupdater::run_updater: true` for deployment-deploy04 via Horizon ([[phab:T256168|T256168]]) * 22:58 bd808: bd808@deployment-deploy04 `sudo -u jenkins-deploy /usr/local/bin/wmf-beta-update-all` * 22:36 bd808: bd808@deployment-deploy04 `sudo -u mwdeploy /usr/local/bin/wmf-beta-update-all` * 22:16 bd808: Disabled https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:12 bd808: Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad so that replacement script can be tested ([[phab:T256168|T256168]]) * 22:02 bd808: Cherry-picked {{gerrit|1276813}} to deployment-puppetserver-1 ([[phab:T256168|T256168]]) * 20:11 James_F: Zuul: [wikibase/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [wikidata/query/*] Replace CI testing in Node 20 with Node 24 * 20:11 James_F: Zuul: [analytics/*] Replace CI testing in Node 20 with Node 24 * 20:10 James_F: Zuul: [mediawiki/tools/*] Replace CI testing in Node 20 with Node 24 * 20:06 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:55 James_F: Zuul: [jquery-client] Replace CI testing in Node 20 with Node 24 * 19:51 James_F: Zuul: [wikipeg] Drop testing in Node 20 and Node 22 * 19:47 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.34.5-do.3 to 1.35.1-do.3 ([[phab:T423726|T423726]]) * 19:37 James_F: Zuul: [oojs/ui] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [oojs/js] Drop CI testing in Node 20 and Node 22 * 19:37 James_F: Zuul: [unicodejs] Replace CI testing in Node 20 with Node 24 * 19:36 James_F: Zuul: [wikimedia/portals] Drop CI testing in Node 20 and Node 22 * 18:57 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:39 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.3 to 1.34.5-do.3 ([[phab:T423726|T423726]]) * 18:18 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.33.9-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 17:58 James_F: Zuul: [mediawiki/extensions/OAuth] Add dependency on CentralAuth, for [[phab:T415281|T415281]] * 17:56 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.13-do.2 to 1.33.9-do.3 ([[phab:T423726|T423726]]) * 16:35 James_F: Zuul: Enforce PHP 8.5 CI for MW things in master (and REL1_46), for [[phab:T411814|T411814]] * 16:19 James_F: Zuul: [mediawiki/services/parsoid] Enable PHP 8.5 CI * 15:47 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add AntiSpoof dependency, for [[phab:T420548|T420548]] * 14:20 Lucas_WMDE: ssh integration-castor06.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24 # fix failure seen in mediawiki-node24 8385 and 8405 * 12:56 James_F: Zuul: [mediawiki/extensions/GrowthExperiments] Add CentralNotice dependency, for [[phab:T422082|T422082]] === 2026-04-22 === * 00:07 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add MF dependency, for [[phab:T424113|T424113]] === 2026-04-21 === * 23:26 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration dep too, for [[phab:T394410|T394410]] * 23:17 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Add standalone test jobs, for [[phab:T422031|T422031]] * 20:47 inflatador: updating cirrussearch hosts to Trixie/OpenSearch 2 [[phab:T421763|T421763]] * 20:38 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add CommunityConfiguration phan dep, for [[phab:T394410|T394410]] * 20:17 bd808: Running tofu for [[phab:T421244|T421244]] * 18:00 James_F: Zuul: [mediawiki/extensions/WatchAnalytics] Add ApprovedRevs Phan dependency * 16:35 bd808: Unblock 79.116.0.0/16 * 13:34 James_F: Zuul: [mediawiki/extensions/WikiLambda] Add TestKitchen phan dep, for [[phab:T415254|T415254]] * 13:27 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add CentralAuth dependency, for [[phab:T420548|T420548]] === 2026-04-20 === * 23:56 bd808: Unblock 76.157.0.0/16 * 18:28 dancy: Upgrading gitlab cloud runners (staging) to 1.33.9-do.2 ([[phab:T423726|T423726]]) * 18:28 dancy: Upgrading gitlab cloud runners (staging) ([[phab:T423726|T423726]]) * 18:19 James_F: jjb: All 486 (!) jobs now updated for [[phab:T423622|T423622]] * 18:18 bd808: Unblock 113.128.0.0/15 * 15:03 James_F: Docker: Bump ci-bullseye/-bookworm/-trixie for mirrors.wm.org removal, [[phab:T423622|T423622]] === 2026-04-19 === * 19:53 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272752 === 2026-04-17 === * 21:07 thcipriani: marking integration-agent-1080 offline for experimentation * 19:30 thcipriani: reconfiguring castor-save-workspace-cache with https://gerrit.wikimedia.org/r/1273935 * 17:47 dancy: Upgrading gitlab cloud runners (prod) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) * 16:49 dancy: Upgrading gitlab cloud runners (staging) k8s from 1.32.10-do.1 to 1.32.13-do.2 ([[phab:T423726|T423726]]) === 2026-04-16 === * 20:49 dduvall: creating integration/zuul-jobs repo to serve as a mirror of opendev.org/zuul/zuul-jobs ([[phab:T406384|T406384]]) * 13:38 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1272711 [[phab:T423568|T423568]] * 11:07 Silvan_WMDE: sudo -u jenkins-deploy rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node24/ # run on integration-castor06.integration.eqiad1.wikimedia.cloud === 2026-04-15 === * 20:05 James_F: Zuul: Configure REL1_46 CI, for [[phab:T423257|T423257]] * 17:44 bd808: Unblock 176.0.0.0/13 * 17:39 bd808: Unblock 46.128.0.0/16 * 17:32 bd808: Unblock 176.86.0.0/16 * 16:39 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/commit/127d783b2176ac60b646a5fa4f1b1a872ca66340 * 15:33 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/100 * 01:02 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/99 === 2026-04-14 === * 20:42 James_F: Docker: [composer-scratch] Upgrade composer to 2.9.7 and cascade * 16:35 bd808: Unblock 88.112.0.0/14 * 00:48 bd808: Unblock 24.6.0.0/16 * 00:42 bd808: Unblock 152.231.48.0/20 === 2026-04-13 === * 22:00 James_F: Zuul: [mediawiki/vendor] Drop accidental Wikibase browser tests on branches * 20:28 James_F: Zuul: [mediawiki/extensions/Chart] Drop Doxygen publish job, not used * 14:42 James_F: Zuul: [mediawiki/extensions/WikimediaCustomizations] Add FlaggedRevs dep, for [[phab:T421011|T421011]] === 2026-04-12 === * 18:21 James_F: jforrester@contint1002:~$ sudo /usr/sbin/service zuul restart && tail -f -n100 /var/log/zuul/zuul.log # [[phab:T423027|T423027]] === 2026-04-10 === * 23:22 James_F: jforrester@contint1002:~$ zuul enqueue --trigger gerrit --pipeline postmerge --project mediawiki/extensions/ReadingLists --change {{Gerrit|1269498}},2 # [[phab:T422976|T422976]] * 23:20 James_F: Zuul: [mediawiki/extensions/ReadingLists] Publish JS coverage, for [[phab:T422976|T422976]] * 23:13 James_F: Zuul: Migrate a few straggler Node 20 MediaWiki things to Node 24 * 23:01 James_F: Zuul: Move all MediaWiki things from mediawiki-node20 to mediawiki-node24 * 21:59 James_F: Docker: Bump Node base images to March releases and cascade; Upgrade Quibble images from Node 20 to Node 24 * 10:24 hashar: Updating all Quibble jobs to 1.17.1 * 10:22 hashar: Updated PostgreSQL jobs to Quibble 1.17.1 # [[phab:T422110|T422110]] * 10:22 hashar: Updated apitesting job to Quibble 1.17.1 # [[phab:T422843|T422843]] [[phab:T418743|T418743]] * 09:51 hashar: Tag Quibble 1.17.1 @ {{Gerrit|0a1ab3b7c3dfee36c9bc2e9b049957d94e190e85}} === 2026-04-09 === * 15:13 hashar: Rolling back Quibble jobs to 1.16.0 (api-testing stage fails due to missing npm install step` * 14:58 hashar: Upgrading Quibble jobs to 1.17.0 * 14:23 hashar: Tagged Quibble 1.17.0 @ {{Gerrit|864381c6b63bdbcd8c74a3162c406fffcaaf8694}} * 07:48 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1268559 "Zuul: use standalone jobs for GrowthExperiments Cypress tests" {{!}} [[phab:T417412|T417412]] === 2026-04-08 === * 22:19 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1269068 * 22:01 bd808: Unblock 95.216.12.170/32 ([[phab:T422751|T422751]]) * 19:26 brennen: gitlab-webhooks: building & deploying https://gitlab.wikimedia.org/repos/releng/gitlab-webhooks/-/merge_requests/37 - hitting some build tooling stuff, trying a fix per instructions in the error log * 17:54 bd808: Unblock 167.56.0.0/13 ([[phab:T422721|T422721]]) * 06:31 hashar: Deleted integration-agent-castor05 Bullseye instance, replaced by integration-agent-castor06 which is on Bookworm # [[phab:T421114|T421114]] * 06:24 hashar: Deleted integration-agent-qemu-1003 Bullseye image, replaced by integration-agent-qemu-1004 which is on Bookworm # [[phab:T422488|T422488]] === 2026-04-07 === * 22:25 dduvall: adding new pipelinelib labels to ci nodes ([[phab:T422234|T422234]]) * 20:05 hashar: Triggered a build of https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen/ * 17:06 dduvall: added `Docker` label to `contint` jenkins nodes ([[phab:T422507|T422507]]) * 17:05 dduvall: restored missing `pipelinelib` labels on `integration-agent-docker-` CI hosts ([[phab:T422507|T422507]]) * 16:53 bd808: Unblock 73.0.0.0/8 ([[phab:T422498|T422498]]) * 12:36 hashar: jjb: use $CASTOR_HOST for Quibble success cache. https://gerrit.wikimedia.org/r/1268545 {{!}} This causes the Quibble jobs to use a new instance for the success cache, which is empty # [[phab:T383243|T383243]] [[phab:T421114|T421114]] * 12:17 hashar: Migrated Castor from integration-castor05 to integration-castor06. Updated CASTOR_HOST in Jenkins and moved the Cinder volume to the new instance #ย [[phab:T421114|T421114]] * 11:14 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1090, 1091, 1092 and 1093 # [[phab:T421114|T421114]] * 10:09 hashar: Added Bookworm based Jenkins agents to the pool Hostnames 1083 to 1089 # [[phab:T421114|T421114]] * 07:23 hashar: CI Jenkins: removed `blubber` label from all agents after having moved PipelineLib to use the `Docker` label {{!}} [[phab:T422234|T422234]] === 2026-04-06 === * 16:01 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/1268239 === 2026-04-03 === * 20:17 bd808: Unblock 2.54.0.0/16 ([[phab:T422238|T422238]]) * 17:25 bd808: Unblock 31.18.0.0/16 ([[phab:T422245|T422245]]) * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add โ€ฆConfigManager & โ€ฆUserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: [โ€ฆ/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: [โ€ฆ/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: [โ€ฆ/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> krp1u22isxpijeg57irds9os5lfix65 Map of database maintenance 0 449160 2414307 2414169 2026-05-16T00:02:55Z Dexbot 30554 Bot: Updating the report 2414307 wikitext text/x-wiki {{/Header}} == Today (2026-05-16) == == Yesterday (2026-05-15) == {| class="wikitable" |+ codfw |- ! Section !! Work |- | s7 || [[phab:T426380|Switchover s7 master (db2218 -&gt; db2220) (T426380)]] (marostegui) |- |} == Last seven days == {| class="wikitable" |+ eqiad |- ! Section !! Work |- | es6 || [[phab:T419961|Login (T419961)]] (fceratto) |- |} {| class="wikitable" |+ codfw |- ! Section !! Work |- | pc3 || [[phab:T418973|Productionize pc20[21-24] and pc10[21-24] (T418973)]] (marostegui) |- | s7 || * [[phab:T419961|Login (T419961)]] (fceratto) * [[phab:T426142|Switchover s7 master (db2220 -&gt; db2218) (T426142)]] (marostegui) * [[phab:T426380|Switchover s7 master (db2218 -&gt; db2220) (T426380)]] (marostegui) |- | s8 || * [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) * [[phab:T419961|Login (T419961)]] (fceratto) * [[phab:T426291|Switchover s8 master (db2165 -&gt; db2161) (T426291)]] (fceratto) |- |} [[Category:MariaDB]] 6w0kbd1y9pyg6vdhpzgn86dm2l2b5ck Wikimedia Cloud Services team/Ownership 0 452153 2414265 2414094 2026-05-15T15:33:37Z FNegri-WMF 32595 Add Admin docs link for Toolforge API Gateway 2414265 wikitext text/x-wiki {| class="wikitable sortable" ! Service !AKA!! Phabricator Tag !! Category !Admin docs !Alerts !Support level |- | [https://paws.wmcloud.org/ PAWS] | || [[phab:tag/paws|paws]] || Tool | | |{{rating|3|3}} |- | [[quarry:|Quarry]] | || [[phab:tag/quarry/|quarry]] || Tool | | |{{rating|1|3}} |- | <s>[[Superset]]</s> | || [[phab:tag/superset.wmcloud.org/|superset.wmcloud.org]] || Tool |''This is now supported by the community'' | |N/A |- | [https://toolhub.wikimedia.org/ Toolhub] | || [[phab:tag/toolhub|toolhub]] || Tool |[[toolhub.wikimedia.org]] | |{{rating|3|3}} |- | [http://toolsadmin.wikimedia.org/ Toolsadmin] |Striker|| [[phab:tag/striker/|striker]] || Tool |[[toolsadmin.wikimedia.org]] | |{{rating|3|3}} |- | [https://openstack-browser.toolforge.org/ OpenStack Browser] | || || Tool |[[Tool:Openstack-browser]] | |{{rating|2|3}} |- |[https://k8s-status.toolforge.org/ K8s status] | | |Tool | | | |- |[[Tool:Fourohfour]] | | |Tool |[[Tool:Fourohfour]] | | |- |Toolforge API Gateway | |TBD |Platform Service |[[Portal:Toolforge/Admin/API Gateway]] | |TBD |- | [[Help:Toolforge/Jobs framework|Toolforge Jobs Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [https://toolforge.org/ Toolforge] (k8s) | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Envvars_Service|Toolforge Envvars Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Portal:Toolforge/Ongoing Efforts/Toolforge Build Service|Toolforge Build Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Deploy_your_tool|Toolforge Components Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- |[[Portal:Toolforge/Admin/Kubernetes/lima-kilo|Toolforge lima-kilo]] | | |Platform Service | | | |- | [[Help:Toolforge/Database|Toolsdb (Db-as-a-service)]] | || [[phab:tag/toolforge|toolforge]]|| Managed Service |[[Portal:Toolforge/Admin/ToolsDB]] | |{{rating|3|3}} |- | [[Cinder|Volumes (storage-as-a-service)]] |OpenStack Cinder|| || Managed Service | | |{{rating|3|3}} |- | [[Help:Trove_database_user_guide|Databases (Db-as-a-service)]] |OpenStack Trove|| || Managed Service |[[Portal:Cloud VPS/Admin/Trove]] | |{{rating|2|3}} |- | [[Portal:Cloud VPS/Admin/Magnum|Kubernetes (k8s-as-a-service)]] | OpenStack Magnum|| || Managed Service |[[Portal:Cloud VPS/Admin/Magnum]] | |{{rating|1|3}} |- | [[Portal:Cloud VPS|Compute (VM-as-a-service)]] |OpenStack Nova|| || Managed Service | | |{{rating|3|3}} |- | [[Portal:Cloud VPS/Admin/DNS|DNS (DNS-as-a-service)]] |OpenStack Designate|| || Managed Service |[[Portal:Cloud VPS/Admin/DNS]][[Portal:Cloud VPS/Admin/DNS/Designate]] | |{{rating|3|3}} |- | [[horizonlabs:|Cloud Dashboards]] |OpenStack Horizon|| || Managed Service |[[Portal:Cloud VPS/Admin/Horizon]] | |{{rating|3|3}} |- | [[Help:Horizon FAQ|Puppet/nova integration + Horizon UI]] | || || Managed Service | | |{{rating|3|3}} |- | [[Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet|Nova proxy service]] | || || Managed Service | | |{{rating|3|3}} |- | VM backups | || || Managed Service |[[Portal:Cloud VPS/Admin/Instance backups]] | |{{rating|2|3}} |- | Cinder Backups | || || Managed Service | | |{{rating|1|3}} |- | [[Portal:Cloud VPS/Infrastructure|Openstack]] | || || Foundational infrastructure | | |{{rating|3|3}} |- | [[Nova Resource:Metricsinfra/Documentation|metricsinfra]] | || || Foundational infrastructure | | |{{rating|2|3}} |- | [[Ceph]] | || || Foundational infrastructure |[[Portal:Cloud VPS/Admin/Ceph]] | |{{rating|3|3}} |- | [[Help:Shared storage|Shared storage]] | Read/write NFS|| || Foundational infrastructure |[[Portal:Data Services/Admin/Shared storage]] | |{{rating|3|3}} |- |[[Help:Wiki Replicas|Wiki Replicas]] | |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Wiki Replicas]] | |{{rating|3|3}} |- |[[Dumps/Dump servers|Dumps servers]] |clouddumps |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Dumps]] | |{{rating|2|3}} |} === Support levels === {{rating|3|3}} Fully supported. These services should be working and available to users at all times; any downtime will be publicly announced and documented. Users can rely on these services as reliable, long-term foundations for their work. {{rating|2|3}} Partial support. Service should be useful for some use cases and running most of the time but some features may be broken or consciously neglected. Outages will be tracked but not necessarily addressed immediately. Approach with caution when relying on these services for major projects. {{rating|1|3}} Little staff support. Service may be experimental, a work in progress, volunteer-maintained, or in the process of being phased out. Outages may be ignored entirely. Do not rely on stability of these services for your work. {{Note|All WMCS projects are held to a 'best effort' standard. We do not guarantee particular uptime metrics, and even urgent work will not take priority over sleep, crying babies, meals, etc.}} == See also == * [[Portal:Cloud VPS/Admin/Skill matrix]] an08ke7gaeax4bdnt4kvhbfgy9rqchg 2414267 2414265 2026-05-15T15:42:16Z FNegri-WMF 32595 Add phab tags where missing 2414267 wikitext text/x-wiki {| class="wikitable sortable" ! Service !AKA!! Phabricator Tag !! Category !Admin docs !Alerts !Support level |- | [https://paws.wmcloud.org/ PAWS] | || [[phab:tag/paws|paws]] || Tool | | |{{rating|3|3}} |- | [[quarry:|Quarry]] | || [[phab:tag/quarry/|quarry]] || Tool | | |{{rating|1|3}} |- | <s>[[Superset]]</s> | || [[phab:tag/superset.wmcloud.org/|superset.wmcloud.org]] || Tool |''This is now supported by the community'' | |N/A |- | [https://toolhub.wikimedia.org/ Toolhub] | || [[phab:tag/toolhub|toolhub]] || Tool |[[toolhub.wikimedia.org]] | |{{rating|3|3}} |- | [http://toolsadmin.wikimedia.org/ Toolsadmin] |Striker|| [[phab:tag/striker/|striker]] || Tool |[[toolsadmin.wikimedia.org]] | |{{rating|3|3}} |- | [https://openstack-browser.toolforge.org/ OpenStack Browser] | || [[phab:tag/tool-openstack-browser|tool-openstack-browser]]|| Tool |[[Tool:Openstack-browser]] | |{{rating|2|3}} |- |[https://k8s-status.toolforge.org/ K8s status] | |[[phab:tag/tool-k8s-status|tool-k8s-status]] |Tool | | | |- |[[Tool:Fourohfour]] | |[phab:tag/toolforge][[phab:tag/toolforge|toolforge]] |Tool |[[Tool:Fourohfour]] | | |- |Toolforge API Gateway | |[[phab:tag/toolforge|toolforge]] |Platform Service |[[Portal:Toolforge/Admin/API Gateway]] | |TBD |- | [[Help:Toolforge/Jobs framework|Toolforge Jobs Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [https://toolforge.org/ Toolforge] (k8s) | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Envvars_Service|Toolforge Envvars Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Portal:Toolforge/Ongoing Efforts/Toolforge Build Service|Toolforge Build Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Deploy_your_tool|Toolforge Components Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- |[[Portal:Toolforge/Admin/Kubernetes/lima-kilo|Toolforge lima-kilo]] | |[[phab:tag/toolforge|toolforge]] |Platform Service | | | |- | [[Help:Toolforge/Database|Toolsdb (Db-as-a-service)]] | || [[phab:tag/toolforge|toolforge]]|| Managed Service |[[Portal:Toolforge/Admin/ToolsDB]] | |{{rating|3|3}} |- | [[Cinder|Volumes (storage-as-a-service)]] |OpenStack Cinder|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Help:Trove_database_user_guide|Databases (Db-as-a-service)]] |OpenStack Trove|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Trove]] | |{{rating|2|3}} |- | [[Portal:Cloud VPS/Admin/Magnum|Kubernetes (k8s-as-a-service)]] | OpenStack Magnum|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Magnum]] | |{{rating|1|3}} |- | [[Portal:Cloud VPS|Compute (VM-as-a-service)]] |OpenStack Nova|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Portal:Cloud VPS/Admin/DNS|DNS (DNS-as-a-service)]] |OpenStack Designate|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/DNS]][[Portal:Cloud VPS/Admin/DNS/Designate]] | |{{rating|3|3}} |- | [[horizonlabs:|Cloud Dashboards]] |OpenStack Horizon|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Horizon]] | |{{rating|3|3}} |- | [[Help:Horizon FAQ|Puppet/nova integration + Horizon UI]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet|Nova proxy service]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | VM backups | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Instance backups]] | |{{rating|2|3}} |- | Cinder Backups | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|1|3}} |- | [[Portal:Cloud VPS/Infrastructure|Openstack]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure | | |{{rating|3|3}} |- | [[Nova Resource:Metricsinfra/Documentation|metricsinfra]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure |[[Nova Resource:Metricsinfra]] | |{{rating|2|3}} |- | [[Ceph]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure |[[Portal:Cloud VPS/Admin/Ceph]] | |{{rating|3|3}} |- | [[Help:Shared storage|Shared storage]] | Read/write NFS|| [phab:tag/data-services/][[phab:tag/cloud-vps|data-services]]|| Foundational infrastructure |[[Portal:Data Services/Admin/Shared storage]] | |{{rating|3|3}} |- |[[Help:Wiki Replicas|Wiki Replicas]] | |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Wiki Replicas]] | |{{rating|3|3}} |- |[[Dumps/Dump servers|Dumps servers]] |clouddumps |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Dumps]] | |{{rating|2|3}} |} === Support levels === {{rating|3|3}} Fully supported. These services should be working and available to users at all times; any downtime will be publicly announced and documented. Users can rely on these services as reliable, long-term foundations for their work. {{rating|2|3}} Partial support. Service should be useful for some use cases and running most of the time but some features may be broken or consciously neglected. Outages will be tracked but not necessarily addressed immediately. Approach with caution when relying on these services for major projects. {{rating|1|3}} Little staff support. Service may be experimental, a work in progress, volunteer-maintained, or in the process of being phased out. Outages may be ignored entirely. Do not rely on stability of these services for your work. {{Note|All WMCS projects are held to a 'best effort' standard. We do not guarantee particular uptime metrics, and even urgent work will not take priority over sleep, crying babies, meals, etc.}} == See also == * [[Portal:Cloud VPS/Admin/Skill matrix]] onj1p4nttyqwjiqbsmul0keevlsskvm 2414268 2414267 2026-05-15T15:44:57Z FNegri-WMF 32595 fix formatting 2414268 wikitext text/x-wiki {| class="wikitable sortable" ! Service !AKA!! Phabricator Tag !! Category !Admin docs !Alerts !Support level |- | [https://paws.wmcloud.org/ PAWS] | || [[phab:tag/paws|paws]] || Tool | | |{{rating|3|3}} |- | [[quarry:|Quarry]] | || [[phab:tag/quarry/|quarry]] || Tool | | |{{rating|1|3}} |- | <s>[[Superset]]</s> | || [[phab:tag/superset.wmcloud.org/|superset.wmcloud.org]] || Tool |''This is now supported by the community'' | |N/A |- | [https://toolhub.wikimedia.org/ Toolhub] | || [[phab:tag/toolhub|toolhub]] || Tool |[[toolhub.wikimedia.org]] | |{{rating|3|3}} |- | [http://toolsadmin.wikimedia.org/ Toolsadmin] |Striker|| [[phab:tag/striker/|striker]] || Tool |[[toolsadmin.wikimedia.org]] | |{{rating|3|3}} |- | [https://openstack-browser.toolforge.org/ OpenStack Browser] | || [[phab:tag/tool-openstack-browser|tool-openstack-browser]]|| Tool |[[Tool:Openstack-browser]] | |{{rating|2|3}} |- |[https://k8s-status.toolforge.org/ K8s status] | |[[phab:tag/tool-k8s-status|tool-k8s-status]] |Tool | | | |- |[[Tool:Fourohfour]] | |[[phab:tag/toolforge|toolforge]] |Tool |[[Tool:Fourohfour]] | | |- |Toolforge API Gateway | |[[phab:tag/toolforge|toolforge]] |Platform Service |[[Portal:Toolforge/Admin/API Gateway]] | |TBD |- | [[Help:Toolforge/Jobs framework|Toolforge Jobs Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [https://toolforge.org/ Toolforge] (k8s) | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Envvars_Service|Toolforge Envvars Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Portal:Toolforge/Ongoing Efforts/Toolforge Build Service|Toolforge Build Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Deploy_your_tool|Toolforge Components Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- |[[Portal:Toolforge/Admin/Kubernetes/lima-kilo|Toolforge lima-kilo]] | |[[phab:tag/toolforge|toolforge]] |Platform Service | | | |- | [[Help:Toolforge/Database|Toolsdb (Db-as-a-service)]] | || [[phab:tag/toolforge|toolforge]]|| Managed Service |[[Portal:Toolforge/Admin/ToolsDB]] | |{{rating|3|3}} |- | [[Cinder|Volumes (storage-as-a-service)]] |OpenStack Cinder|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Help:Trove_database_user_guide|Databases (Db-as-a-service)]] |OpenStack Trove|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Trove]] | |{{rating|2|3}} |- | [[Portal:Cloud VPS/Admin/Magnum|Kubernetes (k8s-as-a-service)]] | OpenStack Magnum|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Magnum]] | |{{rating|1|3}} |- | [[Portal:Cloud VPS|Compute (VM-as-a-service)]] |OpenStack Nova|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Portal:Cloud VPS/Admin/DNS|DNS (DNS-as-a-service)]] |OpenStack Designate|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/DNS]][[Portal:Cloud VPS/Admin/DNS/Designate]] | |{{rating|3|3}} |- | [[horizonlabs:|Cloud Dashboards]] |OpenStack Horizon|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Horizon]] | |{{rating|3|3}} |- | [[Help:Horizon FAQ|Puppet/nova integration + Horizon UI]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet|Nova proxy service]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | VM backups | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Instance backups]] | |{{rating|2|3}} |- | Cinder Backups | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|1|3}} |- | [[Portal:Cloud VPS/Infrastructure|Openstack]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure | | |{{rating|3|3}} |- | [[Nova Resource:Metricsinfra/Documentation|metricsinfra]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure |[[Nova Resource:Metricsinfra]] | |{{rating|2|3}} |- | [[Ceph]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure |[[Portal:Cloud VPS/Admin/Ceph]] | |{{rating|3|3}} |- | [[Help:Shared storage|Shared storage]] | Read/write NFS|| [phab:tag/data-services/][[phab:tag/cloud-vps|data-services]]|| Foundational infrastructure |[[Portal:Data Services/Admin/Shared storage]] | |{{rating|3|3}} |- |[[Help:Wiki Replicas|Wiki Replicas]] | |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Wiki Replicas]] | |{{rating|3|3}} |- |[[Dumps/Dump servers|Dumps servers]] |clouddumps |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Dumps]] | |{{rating|2|3}} |} === Support levels === {{rating|3|3}} Fully supported. These services should be working and available to users at all times; any downtime will be publicly announced and documented. Users can rely on these services as reliable, long-term foundations for their work. {{rating|2|3}} Partial support. Service should be useful for some use cases and running most of the time but some features may be broken or consciously neglected. Outages will be tracked but not necessarily addressed immediately. Approach with caution when relying on these services for major projects. {{rating|1|3}} Little staff support. Service may be experimental, a work in progress, volunteer-maintained, or in the process of being phased out. Outages may be ignored entirely. Do not rely on stability of these services for your work. {{Note|All WMCS projects are held to a 'best effort' standard. We do not guarantee particular uptime metrics, and even urgent work will not take priority over sleep, crying babies, meals, etc.}} == See also == * [[Portal:Cloud VPS/Admin/Skill matrix]] hg111zd96p637gt0qfeu6pa3lw8nueq 2414270 2414268 2026-05-15T15:46:31Z FNegri-WMF 32595 shared storage: fix formatting 2414270 wikitext text/x-wiki {| class="wikitable sortable" ! Service !AKA!! Phabricator Tag !! Category !Admin docs !Alerts !Support level |- | [https://paws.wmcloud.org/ PAWS] | || [[phab:tag/paws|paws]] || Tool | | |{{rating|3|3}} |- | [[quarry:|Quarry]] | || [[phab:tag/quarry/|quarry]] || Tool | | |{{rating|1|3}} |- | <s>[[Superset]]</s> | || [[phab:tag/superset.wmcloud.org/|superset.wmcloud.org]] || Tool |''This is now supported by the community'' | |N/A |- | [https://toolhub.wikimedia.org/ Toolhub] | || [[phab:tag/toolhub|toolhub]] || Tool |[[toolhub.wikimedia.org]] | |{{rating|3|3}} |- | [http://toolsadmin.wikimedia.org/ Toolsadmin] |Striker|| [[phab:tag/striker/|striker]] || Tool |[[toolsadmin.wikimedia.org]] | |{{rating|3|3}} |- | [https://openstack-browser.toolforge.org/ OpenStack Browser] | || [[phab:tag/tool-openstack-browser|tool-openstack-browser]]|| Tool |[[Tool:Openstack-browser]] | |{{rating|2|3}} |- |[https://k8s-status.toolforge.org/ K8s status] | |[[phab:tag/tool-k8s-status|tool-k8s-status]] |Tool | | | |- |[[Tool:Fourohfour]] | |[[phab:tag/toolforge|toolforge]] |Tool |[[Tool:Fourohfour]] | | |- |Toolforge API Gateway | |[[phab:tag/toolforge|toolforge]] |Platform Service |[[Portal:Toolforge/Admin/API Gateway]] | |TBD |- | [[Help:Toolforge/Jobs framework|Toolforge Jobs Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [https://toolforge.org/ Toolforge] (k8s) | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Envvars_Service|Toolforge Envvars Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Portal:Toolforge/Ongoing Efforts/Toolforge Build Service|Toolforge Build Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- | [[Help:Toolforge/Deploy_your_tool|Toolforge Components Service]] | || [[phab:tag/toolforge|toolforge]] || Platform Service | | |{{rating|3|3}} |- |[[Portal:Toolforge/Admin/Kubernetes/lima-kilo|Toolforge lima-kilo]] | |[[phab:tag/toolforge|toolforge]] |Platform Service | | | |- | [[Help:Toolforge/Database|Toolsdb (Db-as-a-service)]] | || [[phab:tag/toolforge|toolforge]]|| Managed Service |[[Portal:Toolforge/Admin/ToolsDB]] | |{{rating|3|3}} |- | [[Cinder|Volumes (storage-as-a-service)]] |OpenStack Cinder|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Help:Trove_database_user_guide|Databases (Db-as-a-service)]] |OpenStack Trove|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Trove]] | |{{rating|2|3}} |- | [[Portal:Cloud VPS/Admin/Magnum|Kubernetes (k8s-as-a-service)]] | OpenStack Magnum|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Magnum]] | |{{rating|1|3}} |- | [[Portal:Cloud VPS|Compute (VM-as-a-service)]] |OpenStack Nova|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Portal:Cloud VPS/Admin/DNS|DNS (DNS-as-a-service)]] |OpenStack Designate|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/DNS]][[Portal:Cloud VPS/Admin/DNS/Designate]] | |{{rating|3|3}} |- | [[horizonlabs:|Cloud Dashboards]] |OpenStack Horizon|| [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Horizon]] | |{{rating|3|3}} |- | [[Help:Horizon FAQ|Puppet/nova integration + Horizon UI]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | [[Help:Using_a_web_proxy_to_reach_Cloud_VPS_servers_from_the_internet|Nova proxy service]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|3|3}} |- | VM backups | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service |[[Portal:Cloud VPS/Admin/Instance backups]] | |{{rating|2|3}} |- | Cinder Backups | || [[phab:tag/cloud-vps|cloud-vps]]|| Managed Service | | |{{rating|1|3}} |- | [[Portal:Cloud VPS/Infrastructure|Openstack]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure | | |{{rating|3|3}} |- | [[Nova Resource:Metricsinfra/Documentation|metricsinfra]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure |[[Nova Resource:Metricsinfra]] | |{{rating|2|3}} |- | [[Ceph]] | || [[phab:tag/cloud-vps|cloud-vps]]|| Foundational infrastructure |[[Portal:Cloud VPS/Admin/Ceph]] | |{{rating|3|3}} |- | [[Help:Shared storage|Shared storage]] | Read/write NFS|| [[phab:tag/data-services/|data-services]]|| Foundational infrastructure |[[Portal:Data Services/Admin/Shared storage]] | |{{rating|3|3}} |- |[[Help:Wiki Replicas|Wiki Replicas]] | |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Wiki Replicas]] | |{{rating|3|3}} |- |[[Dumps/Dump servers|Dumps servers]] |clouddumps |[[phab:tag/data-services/|data-services]] |Managed Service |[[Portal:Data Services/Admin/Dumps]] | |{{rating|2|3}} |} === Support levels === {{rating|3|3}} Fully supported. These services should be working and available to users at all times; any downtime will be publicly announced and documented. Users can rely on these services as reliable, long-term foundations for their work. {{rating|2|3}} Partial support. Service should be useful for some use cases and running most of the time but some features may be broken or consciously neglected. Outages will be tracked but not necessarily addressed immediately. Approach with caution when relying on these services for major projects. {{rating|1|3}} Little staff support. Service may be experimental, a work in progress, volunteer-maintained, or in the process of being phased out. Outages may be ignored entirely. Do not rely on stability of these services for your work. {{Note|All WMCS projects are held to a 'best effort' standard. We do not guarantee particular uptime metrics, and even urgent work will not take priority over sleep, crying babies, meals, etc.}} == See also == * [[Portal:Cloud VPS/Admin/Skill matrix]] k6onh1i4u2jxh3dt4lei2zvjyv9fgg8 Data Platform Engineering/Ops week/Analytics weekly train 0 459290 2414280 2413998 2026-05-15T16:35:11Z SNwachukwu (WMF) 39521 update deployment train items 2414280 wikitext text/x-wiki ...๐Ÿš‚๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ๐Ÿšƒ == Analytics deployment train == ย ย  โ˜‘๏ธ Only add here stuff that has been merged. ย ย  โ˜‘๏ธ Link the task and the Gerrit patch. ย ย  โ˜‘๏ธ List the systems that need deploying, jar versions that need bump-ups, and jobs that need restarting, if there are any. ย  ย  ย Extra points if you include what to run and where to run it (e.g. stat1007, an-coord1001...). ย ย  โ˜‘๏ธ Do you have a way of checking the deployment has been successful? ย ย  โ˜‘๏ธ Don't move stuff to "ready to deploy" in the kanban unless it's documented here. ย ย  โ˜‘๏ธ Check [[Data Engineering/Ops week#The Data Engineering deployment train ๐Ÿš‚|Data_Engineering/Ops_week#The_Data_Engineering_deployment_train_'''๐Ÿš‚''']] for a '''pointer about Wikistats, as well as links for various types of deployments.''' ย ย  โ˜‘๏ธ To see the old log, go to [[etherpad:p/analytics-weekly-train/timeslider#59750|https://etherpad.wikimedia.org/p/analytics-weekly-train/timeslider#59747]]. '''Now use the log below.''' Eventually we could have some sub-pages or templates to streamline this. === YYYY-MM-DD NEXT TUESDAY TRAIN (REPLACE THIS AFTER DEPLOY) === === NEXT TRAIN === Refinery Source: * 1286385: Add event_user_is_cross_wiki to wmf.mediawiki_history | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1286385 * 1285904: Add event_log_id to wmf.mediawiki_history | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1285904 * 1286481: Upgrade graphframes to 0.11.0 from Maven Central, drop Archiva repos | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1286481 * 1286397: Refactor MediawikiEvent.fromRow to use named column access | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1286397 * 1286989: Remove wmf-analytics-old-uploads Archiva repository | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1286989 * 1287508: Add Sanitizer to clean up wprov value of x-analytics. | <nowiki>https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1287508</nowiki> Refinery: please deploy Refinery after deploying Refinery Source above. * 1285903: Add event_log_id to mediawiki_history DDL | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1285903 * 1285527: querypage: Add UncategorizedImages.hql | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1285527 * [[gerrit:c/analytics/refinery/+/1283749|expand_event_sanitized_analytics_allowlist: Add revertrisk-multilingual predictions to allowlist. (1283749)]] * 1286383: Add event_user_is_cross_wiki to mediawiki_history DDL | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1286383 * 1287443: Add mediawiki_page_html_feature_counts_change_v1 to allowlist for event_sanitized | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1287443 * 1287909: Use SanitizeXAnalyticsWprovUDF to normalize x_analytics[wprov] values | <nowiki>https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1287909</nowiki> === '''Thursday, May 07, 2026''' === Deployer: Aisha Refinery: * 1277789: querypage: Add WantedCategories.hql | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1277789 * 1278746: Remove mediawiki_revision_score from sanitization main allowlist | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1278746 * 1279483: Remove SearchSatisfaction from sanitization analytics allowlist | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1279483 * 1279651: mediarequest_hourly: use file/filetypes as media_classification ground truth | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1279651 === '''Wednesday, April 29, 2026''' === Deployer: Antonio Refinery: * move hql script from fundraising to fr_tech | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1260793 * 1267966: querypage: MostCategories: Include all content namespaces | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1267966 * 1276836: querypage: Add UnusedTemplates.hql | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1276836 * 1275815: Update webrequest validation algorithm | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1275815 * 1275982: Remove DesktopWebUIActionsTracking, MobileWebUIActionsTracking, ReadingDepth from sanitization allowlist | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1275982<br /> === '''Thursday, March 25, 2026''' === Deployer: Aisha and Sandra Refinery: * Add abstract.wikipedia to pageview allowlist | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1256413 * Changes to mapper-weight for centralauth_localuser | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1256302 * Move bot detection pipeline into new repo | <nowiki>https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1237928</nowiki> === Thursday, March 10, 2026 === By mforns Refinery: * Add kai.wikipedia to the pageview allowlist https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1249328 DONE (sync'ed by hand) Airflow: * Artifact cleaning: remove outdated refinery job artifacts | https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/2030#d6baf582888b568d8a7bcb95316bd03cbefa9853 | '''Note this change might cause some backward compatibility issues and we would need to monitor the DAGs closely after deployment.''' DONE === 2026-03-11 === Deployer: joal Refinery-source: * Update ProduceCanaryEvents job https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1249982 + https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1250016 === 2026-03-05 (special Thursday post cleanups) === Deployer: dr0ptp4kt (with Marcel and Sandra) Refinery: * Adapt imagelinks pipeline and consumers for imagelink normalization | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1239200 * No-op: Fix druid banner_activity data prep job | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1240253 Airflow: * After refinery deployment. Pass mediawiki_private_linktarget_table to commons impact metrics dag | https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/2026 === 2026-02-18 === Deployer: joal Refinery: * Use names in banner activity GROUP BY - https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1239232 * Add first_campaign_status_code for banner activity - https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1238821 === 2026-02-10 === Deployer: xcollazo Refinery: * 1235830: MediawikiDumper: fix filenames to include end revision when covering a single page. | <nowiki>https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1235830</nowiki> * 1236347: Migrate cu_changes table to use cuua_text in new cu_usergent table. | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1236347 === 2026-02-01 === Deployer: Joseph Refinery: * 1233834: Remove mediawiki_wikitext_* from refinery-drop-mediawiki-snapshots | https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1233834 ** Minor non-urgent patch. No need to release if just this patch. * Update pageview project allowlist - https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1235201 * HQL for druid webrequest_sampled ingestion https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1235740 Airflow: * Load webrequest_sampled in druid hourly https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1967 === 2026-01-21 === Deployer: Antonio/Joseph *Refinery **Update pingback HQL code for new PHP and MediaWiki versions https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1222506 **Update pageview allowlist ***https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1225039 ***https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1229091 ***https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1229095 ***https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1229507 **Update event _sanitized allowlist https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1207489 *Airflow **Update pingback MediaWiki and PHP versions to include new values https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1909 ***We need the refinery deployment done first ***After deploying this, from Cindy: Once the patches are merged, the weekly queries will need to be re-run starting from the beginning of May 2025. Xcollazo is happy to do this part after we deploy. Just ping Xcollazo. === 2025-12-03 === Deployer: Antoine '''Refinery:''' * {{PhabT|T409584}} Add JA3N User-Agent queries https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1212214 and https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1213488 and https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1213522 (no need to do anything else!) === 2025-11-18 === Deployer: Marcel and Javier '''Refinery:''' * {{PhabT|405039}} - Add HQL for edit_per_editor_per_page_daily and pageview_per_editor_per_page_daily https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1196892 DONE === 2025-11-12 === Deployer: Joal '''Refinery-source:''' * {{PhabT|406531}} - Add new referral sources to pageview data https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1203389 * {{PhabT|408178}} - Remove mediawiki.wikistories_* santization allowlist entries https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1202718 * '''[[phab:T407239|T407239]]''' '''-''' Fix Duplicate Pageview metrics records in data quality tables. | <nowiki>https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1203129</nowiki> * [[phab:T406000|T406000 Adapt mediawiki_history to the removal of mediawiki revision.rev_sha10]] ({{Gerrit|1202334}}) * 1203124: Fix bug MW Dumper in which vertical bars ( `|` ) were not being honored. | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1203124 ** After refine-source release, we should: *** merge https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1795 that will pick up this fix on the File Export DAGs *** wait until merge request makes it to main Airflow instance *** delete DagProperties at https://airflow.wikimedia.org/variable/edit/372 , so that the auto-regenerated one points to new jar *** resume the following DAGs, which have been cleared and are ready to go: **** https://airflow.wikimedia.org/dags/mw_content_xml_export_current_mid_month/grid **** https://airflow.wikimedia.org/dags/mw_content_xml_export_current_monthly/grid **** https://airflow.wikimedia.org/dags/mw_content_xml_export_history_monthly/grid '''Airflow:''' * {{PhabT|406531}} - Add new referral sources to pageview data - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1796 * {{PhabT|409470}} - Fix mediawiki_history_dumps failure - https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1797 * https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1795 (see above in refinery-source section) === 2025-11-05 === Deployer: Joseph Refinery Source: * 1199485: Add Data quality check for Pageview Human-Bot ratio anomaly | [[gerrit:c/aalytics/refinery/source/+/1199485|https://gerrit.wikimedia.org/r/c/aalytics/refinery/source/+/1199485]] * {{PhabT|406531}} - Add new referral sources to pageview data https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1198313 * Mediawiki-History Bug fix: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1202191 Airflow: * [[phab:T407239|T407239]] - Add Dag to run daily Human to Bot page views ratio check https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1776 This MR should be deployed after refinery source is deployed. It needs refinery-job jar v0.3.7 * {{PhabT|406531}} - Add new referral sources to pageview data https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1780 This MR should be deployed after refinery source is deployed. It needs refinery-hive jar v0.3.7 '''<big>2025-10-29</big>''' deployer: Sandra Refinery Source: * 1198080: Fix various bugs on MW Dumper code. | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1198080 * 1198152: Add utility to create SHA256 fingerprints of the files of a particular HDFS folder. | https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1198152 === 2025-10-22 === To-be deployer: Aleksander * Refinery Source ** Add user_central_id to the mediawiki_history dataset(s) https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1194951 === 2025-10-14 === To-be deployer: Marcel *Refinery **{{PhabT|405533}} - Unique devices data uses non-standard domains for Wikidata, Wikifunctions, and MediaWiki.org<nowiki/> https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1194885 . Note: This task has a pending Ai<nowiki/>rflow patch to be merged/deployed once this one is deployed: [[gitlab:repos/data-engineering/airflow-dags/-/merge_requests/1743|htt]]<nowiki/>[[gitlab:repos/data-engineering/airflow-dags/-/merge_requests/1743|p]][[gitlab:repos/data-engineering/airflow-dags/-/merge_requests/1743|s://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1743]] [DONE] **{{PhabT|406000}} - Adapt mediawiki_history to the removal of mediawiki revision.rev_sha1 - https://gerrit.wikimedia.org/r/c/analytics/refinery/+/1196716 Nullify sha1 in Sqoop [DONE] *Refinery Source **[[phab:T365203|T365203]] - Add check for wikis count to Mediawiki history dat<nowiki/>a quality checks [[gerrit:c/analytics/refinery/source/+/1193440|h]]<nowiki/>[[gerrit:c/analytics/refinery/source/+/1193440|ttps://gerrit.wikim]]<nowiki/>[[gerrit:c/analytics/refinery/source/+/1193440|edia.org/r/c/analytics/refinery/source/+/1193440]] [DONE] **[[phab:T365203|T365]]<nowiki/>[[phab:T365203|203]] - Bug Fix: Add support for Deequ Metric value Distribution d<nowiki/>ata type [[gerrit:c/analytics/refinery/source/+/1195268|https://gerrit.wikim]]<nowiki/>[[gerrit:c/analytics/refinery/source/+/1195268|edia.org/r/c/analytics/refinery/source/+/1195268]] [DONE] **{{PhabT|406000}} - Adapt mediawiki_history to the removal of mediawiki revision.rev_sha1 - [[gerrit:c/analytics/refinery/source/+/1196049|https://gerrit.wikime]][[gerrit:c/analytics/refinery/source/+/1196049|dia.org/r/c/analytics/refinery/source/+/1196049]] and [[gerrit:c/analytics/refinery/source/+/1196469|https://gerrit.wikimedia.or]][[gerrit:c/analytics/refinery/source/+/1196469|g/r/c/analytics/refinery/source/+/1196469]]. Note: This patch needs a related Airflow patch: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1750. This one also: https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1196485 [DONE] **{{PhabT|T384945}} Modify code to dump all slots AND {{PabT|T405641}} Adapt MW Content pipelines to the removal of upstream revision.rev_sha1 - https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/1195330 [DONE] kvpzyqcvxfdfqn2279e9g5aw4h5gsom Fundraising/SAL 0 459699 2414266 2414123 2026-05-15T15:35:05Z Stashbot 7414 ejegg: payments-wiki upgraded from cf9ec80b to 1a056dc0 2414266 wikitext text/x-wiki == 2026-05-15 == * 15:35 ejegg: payments-wiki upgraded from {{Gerrit|cf9ec80b}} to {{Gerrit|1a056dc0}} * 00:02 eileen: civicrm upgraded from {{Gerrit|6d8ce7a3}} to {{Gerrit|6a2258ff}} == 2026-05-14 == * 22:22 ejegg: fundraising scheduled jobs re-enabled * 22:12 eileen: cv upgraded from {{Gerrit|f19e0961}} to {{Gerrit|b8a8dd6a}} * 22:11 ejegg: fundraising civicrm upgraded from {{Gerrit|e25fa223}} to {{Gerrit|6d8ce7a3}} * 22:09 ejegg: fundraising scheduled jobs disabled for Civi update * 19:29 ejegg: re-enabled fundraising scheduled jobs * 19:06 ejegg: fundraising civicrm upgraded from {{Gerrit|950908ec}} to {{Gerrit|e25fa223}} * 19:04 ejegg: disabled fundraising scheduled jobs for CiviCRM deployment * 16:48 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|bb833986}} to {{Gerrit|a2a3a015}} == 2026-05-12 == * 16:07 ejegg: fundraising civicrm upgraded from {{Gerrit|24ac90e7}} to {{Gerrit|60dcc28f}} == 2026-05-06 == * 12:29 eileen: config revision changed from {{Gerrit|41cfd677}} to {{Gerrit|00752f91}} * 10:56 eileen: SmashPig upgraded from {{Gerrit|4201ef56}} to {{Gerrit|bb833986}} * 09:59 eileen: civicrm upgraded from {{Gerrit|4d9c8600}} to {{Gerrit|24ac90e7}} * 09:15 eileen: civicrm upgraded from {{Gerrit|38dcf7a8}} to {{Gerrit|4d9c8600}} == 2026-05-04 == * 17:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|1be60746}} to {{Gerrit|4201ef56}} == 2026-05-03 == * hackathon: civicrm upgraded from {{Gerrit|0afbc8ea}} to {{Gerrit|38dcf7a8}} == 2026-05-02 == * 09:44 eileen: civicrm upgraded from {{Gerrit|7556c5c7}} to {{Gerrit|0afbc8ea}} * 09:43 eileen: SmashPig upgraded from {{Gerrit|88a1bcba}} to {{Gerrit|1be60746}} == 2026-05-01 == * 16:36 eileen: civicrm upgraded from {{Gerrit|1a835879}} to {{Gerrit|7556c5c7}} * 15:08 eileen: civicrm upgraded from {{Gerrit|9ed32632}} to {{Gerrit|1a835879}} * 14:02 eileen: civicrm upgraded from {{Gerrit|081d5a29}} to {{Gerrit|9ed32632}} == 2026-04-29 == * 13:51 jgleeson: payments-wiki upgraded from {{Gerrit|2e2eb8a2}} to {{Gerrit|4e0c944b}} * 13:49 jgleeson: tools upgraded from {{Gerrit|f52a5dcf}} to {{Gerrit|afbd0f67}} == 2026-04-28 == * 20:58 larssandergreen: civicrm upgraded from {{Gerrit|be3bb76b}} to {{Gerrit|081d5a29}} == 2026-04-27 == * 19:04 ejegg: fundraising civicrm upgraded from {{Gerrit|3f8d49fa}} to {{Gerrit|be3bb76b}} * 19:02 ejegg: payments-wiki upgraded from {{Gerrit|b1a352af}} to {{Gerrit|5265089d}} * 18:58 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|572b69da}} to {{Gerrit|88a1bcba}} == 2026-04-23 == * 20:40 ejegg: payments-wiki upgraded from {{Gerrit|6e78ef91}} to {{Gerrit|b1a352af}} * 19:52 ejegg: civicrm fundraising upgraded from {{Gerrit|5ea4c8d3}} to {{Gerrit|3f8d49fa}} * 19:27 ejegg: SmashPig upgraded from {{Gerrit|f1b3f3d9}} to {{Gerrit|572b69da}} * 16:32 larssandergreen: civicrm upgraded from {{Gerrit|53a0b46f}} to {{Gerrit|5ea4c8d3}} * 16:31 larssandergreen: tools upgraded from {{Gerrit|edca3f63}} to {{Gerrit|f52a5dcf}} == 2026-04-22 == * 02:10 eileen: civicrm upgraded from {{Gerrit|abd23ad7}} to {{Gerrit|53a0b46f}} == 2026-04-21 == * 23:16 cstone: civicrm upgraded from {{Gerrit|22f24ae4}} to {{Gerrit|abd23ad7}} * 19:46 larssandergreen: civicrm upgraded from {{Gerrit|ddc1f044}} to {{Gerrit|22f24ae4}} == 2026-04-20 == * 19:21 larssandergreen: tools upgraded from {{Gerrit|26ab0125}} to {{Gerrit|edca3f63}} * 15:09 ejegg: payments-wiki upgraded from {{Gerrit|86a42498}} to {{Gerrit|6e78ef91}} == 2026-04-17 == * 01:08 larssandergreen: civicrm upgraded from {{Gerrit|90c0ccd9}} to {{Gerrit|ddc1f044}} == 2026-04-16 == * 16:39 larssandergreen: tools upgraded from {{Gerrit|f14a814e}} to {{Gerrit|26ab0125}} * 14:20 larssandergreen: civicrm upgraded from {{Gerrit|801847a7}} to {{Gerrit|90c0ccd9}} * 14:19 larssandergreen: tools upgraded from {{Gerrit|9bff5f07}} to {{Gerrit|f14a814e}} * 02:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|61fee241}} to {{Gerrit|f1b3f3d9}} == 2026-04-15 == * 21:50 eileen: civicrm upgraded from {{Gerrit|6f33b6d0}} to {{Gerrit|801847a7}} * 06:14 eileen: civicrm upgraded from {{Gerrit|a047bf92}} to {{Gerrit|6f33b6d0}} * 01:20 eileen: SmashPig upgraded from {{Gerrit|100101fb}} to {{Gerrit|61fee241}} == 2026-04-14 == * 23:58 eileen: civicrm upgraded from {{Gerrit|eb3d73e4}} to {{Gerrit|a047bf92}} * 23:48 wfan: payments-wiki upgraded from {{Gerrit|26f5451a}} to {{Gerrit|c3b34f99}} * 22:31 eileen: civicrm upgraded from {{Gerrit|2058927e}} to {{Gerrit|eb3d73e4}} * 22:25 eileen: civicrm upgraded from {{Gerrit|2058927e}} to {{Gerrit|eb3d73e4}} * 18:06 ejegg: fundraising civicrm upgraded from {{Gerrit|fccf9b3a}} to {{Gerrit|2058927e}} == 2026-04-13 == * 18:20 ejegg: fundraising civicrm upgraded from {{Gerrit|fa20eb0a}} to {{Gerrit|fccf9b3a}} * 17:04 ejegg: re-enabled recurring donation charge jobs * 16:29 ejegg: fundraising civicrm upgraded from {{Gerrit|eb188fa2}} to {{Gerrit|fa20eb0a}} * 16:27 ejegg: disabled recurring donation charge jobs for code / settings update * 12:44 jgleeson: donorwiki upgraded from {{Gerrit|064a770e}} to {{Gerrit|26f5451a}} == 2026-04-10 == * 15:35 jgleeson: payments-wiki upgraded from {{Gerrit|c017d7e7}} to {{Gerrit|dd45f867}} == 2026-04-09 == * 23:31 wfan: payments-wiki upgraded from {{Gerrit|064a770e}} to {{Gerrit|c017d7e7}} * 21:49 ejegg: fundraising civicrm upgraded from {{Gerrit|3d3c0a62}} to {{Gerrit|eb188fa2}} * 19:00 larssandergreen: tools upgraded from {{Gerrit|986f7f83}} to {{Gerrit|9bff5f07}} * 13:08 jgleeson: civicrm upgraded from {{Gerrit|d8d3871c}} to {{Gerrit|3d3c0a62}} * 11:53 jgleeson: SmashPig upgraded from {{Gerrit|5c083891}} to {{Gerrit|100101fb}} * 01:20 ejegg: fundraising civicrm upgraded from {{Gerrit|e60321bb}} to {{Gerrit|d8d3871c}} == 2026-04-08 == * 17:44 ejegg: fundraising civicrm upgraded from {{Gerrit|4ee0b5e8}} to {{Gerrit|e60321bb}} * 15:12 ejegg: payments-wiki upgraded from {{Gerrit|1ad85e6c}} to {{Gerrit|064a770e}} * 01:19 ejegg: donorwiki upgraded from {{Gerrit|1ad85e6c}} to {{Gerrit|064a770e}} * 00:26 dwisehaupt: cloning new frdb frdb1008 from frdb2005 == 2026-04-07 == * 18:34 wfan: civicrm upgraded from {{Gerrit|9104e70b}} to {{Gerrit|6f762e29}} == 2026-04-06 == * 20:42 ejegg: re-enabled recurring donation charge job * 20:33 wfan: donorwiki upgraded from {{Gerrit|c2d03117}} to {{Gerrit|1ad85e6c}} * 20:32 wfan: payments-wiki upgraded from {{Gerrit|80cda166}} to {{Gerrit|1ad85e6c}} * 16:53 ejegg: disabled recurring donations charge job while diagnosing gr4vy routing errors * 16:03 ejegg: civicrm upgraded from {{Gerrit|4ee11209}} to {{Gerrit|9104e70b}} == 2026-04-03 == * 00:04 wfan: civicrm upgraded from {{Gerrit|49f541cd}} to {{Gerrit|4ee11209}} == 2026-04-02 == * 21:38 cstone: payments-wiki upgraded from {{Gerrit|86bec442}} to {{Gerrit|80cda166}} * 05:24 eileen: civicrm upgraded from {{Gerrit|c512abc6}} to {{Gerrit|49f541cd}} * 02:39 eileen: civicrm upgraded from {{Gerrit|bbed1291}} to {{Gerrit|c512abc6}} * 02:16 eileen: SmashPig upgraded from {{Gerrit|9af71a7c}} to {{Gerrit|18ea746a}} == 2026-04-01 == * 18:57 eileen: civicrm upgraded from {{Gerrit|a1bf4768}} to {{Gerrit|bbed1291}} * 04:11 eileen: civicrm upgraded from {{Gerrit|11a2f9ab}} to {{Gerrit|a1bf4768}} * 03:18 ejegg: payments-wiki upgraded from {{Gerrit|02bf54b0}} to {{Gerrit|86bec442}} == 2026-03-31 == * 22:03 jgleeson: tools upgraded from {{Gerrit|9985e723}} to {{Gerrit|986f7f83}} * 20:16 eileen: civicrm upgraded from {{Gerrit|c3cc3562}} to {{Gerrit|b468301c}} * 18:40 jgleeson: tools upgraded from {{Gerrit|161049ac}} to {{Gerrit|9985e723}} * 17:39 ejegg: Standalone (IPN listener) SmashPig upgraded from {{Gerrit|abf8682a}} to {{Gerrit|9af71a7c}} * 16:38 jgleeson: tools upgraded from {{Gerrit|f605b570}} to {{Gerrit|161049ac}} * 16:24 jgleeson: donorwiki updated from {{Gerrit|d79a98b5}} to {{Gerrit|c2d03117}} * 02:46 eileen: civicrm upgraded from {{Gerrit|591bef29}} to {{Gerrit|c3cc3562}} * 01:00 eileen: civicrm upgraded from {{Gerrit|cf871dd3}} to {{Gerrit|591bef29}} == 2026-03-30 == * 23:19 eileen: civicrm upgraded from {{Gerrit|7d299b48}} to {{Gerrit|cf871dd3}} * 21:09 eileen: civicrm upgraded from {{Gerrit|3724cc2d}} to {{Gerrit|7d299b48}} * 20:40 eileen: civicrm upgraded from {{Gerrit|58426b1e}} to {{Gerrit|3724cc2d}} * 14:51 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|545f0b10}} to {{Gerrit|abf8682a}} == 2026-03-28 == * 02:57 ejegg: payments-wiki upgraded from {{Gerrit|d79a98b5}} to {{Gerrit|b239a6b7}} * 02:27 eileen: civicrm upgraded from {{Gerrit|7138d524}} to {{Gerrit|58426b1e}} == 2026-03-27 == * 22:37 eileen: civicrm upgraded from {{Gerrit|c51e98cc}} to {{Gerrit|7138d524}} * 17:50 dwisehaupt: Corrected value: on frdb1005 running the following in mysql to up the buffer pool to ~412G: set global innodb_buffer_pool_size = 431669379072; * 17:34 dwisehaupt: on frdb1005 running the following in mysql to up the buffer pool to ~412G: set global innodb_buffer_pool_size = 421552128; == 2026-03-26 == * 20:00 jgleeson: donorwiki upgraded from {{Gerrit|48cc2e9d}} to {{Gerrit|d79a98b5}} * 19:59 jgleeson: payments-wiki upgraded from {{Gerrit|b387c6ba}} to {{Gerrit|d79a98b5}} * 19:48 ejegg: fundraising civicrm upgraded from {{Gerrit|88b497b8}} to {{Gerrit|db102c77}} * 19:25 larssandergreen: civicrm upgraded from {{Gerrit|26dab2f0}} to {{Gerrit|88b497b8}} * 16:26 jgleeson: SmashPig upgraded from {{Gerrit|5d8a0330}} to {{Gerrit|545f0b10}} * 02:13 eileen: civicrm upgraded from {{Gerrit|a752f9e8}} to {{Gerrit|26dab2f0}} * 00:05 eileen: civicrm upgraded from {{Gerrit|277fe75e}} to {{Gerrit|a752f9e8}} == 2026-03-25 == * 21:53 eileen: civicrm upgraded from {{Gerrit|97319295}} to {{Gerrit|277fe75e}} * 19:58 eileen: civicrm upgraded from {{Gerrit|d3dfedb4}} to {{Gerrit|97319295}} == 2026-03-24 == * 19:32 wfan: payments-wiki upgraded from {{Gerrit|fce3fab5}} to {{Gerrit|b387c6ba}} * 04:24 eileen: civicrm upgraded from {{Gerrit|0aef661f}} to {{Gerrit|b2ed875e}} * 03:38 eileen: config revision changed from {{Gerrit|ded0c289}} to {{Gerrit|16592428}} schedule stripe download * 03:33 eileen: config revision changed from {{Gerrit|79e052e4}} to {{Gerrit|ded0c289}} temporarily disable adyen audit parse - let's fix those misplaced IDs * 01:55 eileen: civicrm upgraded from {{Gerrit|b2c7f1d0}} to {{Gerrit|0aef661f}} * 00:08 eileen: civicrm upgraded from {{Gerrit|80344f51}} to {{Gerrit|b2c7f1d0}} == 2026-03-23 == * 21:30 eileen: config revision changed from {{Gerrit|8c5587f3}} to {{Gerrit|2dd50e7c}} * 18:55 wfan: civicrm upgraded from {{Gerrit|675455b2}} to {{Gerrit|80344f51}} * 17:27 larssandergreen: tools upgraded from {{Gerrit|e60f63b3}} to {{Gerrit|f605b570}} * 15:59 ejegg: civicrm upgraded from {{Gerrit|a2d4b17c}} to {{Gerrit|675455b2}} * 12:25 jgleeson: payments-wiki upgraded from {{Gerrit|48cc2e9d}} to {{Gerrit|91d9eee9}} == 2026-03-20 == * 00:34 eileen: * civicrm upgraded from {{Gerrit|adc36173}} to {{Gerrit|a2d4b17c}} * 00:31 cstone: payments-wiki upgraded from {{Gerrit|f3420a6f}} to {{Gerrit|48cc2e9d}} * 00:27 eileen: config revision changed from {{Gerrit|a7486f6a}} to {{Gerrit|a1a426f3}} * 00:27 eileen: SmashPig upgraded from {{Gerrit|78a8e70a}} to {{Gerrit|5d8a0330}} == 2026-03-19 == * 14:08 damilare: civiproxy upgraded from {{Gerrit|6625c844}} to {{Gerrit|38ba8348}} == 2026-03-17 == * 18:40 jgleeson: donorwiki upgraded from {{Gerrit|4c09db39}} to {{Gerrit|7d1666f9}} * 06:20 eileen: civicrm upgraded from {{Gerrit|7fe14629}} to {{Gerrit|adc36173}} * 05:06 eileen: civicrm upgraded from {{Gerrit|e622a222}} to {{Gerrit|7fe14629}} * 03:48 eileen: civicrm upgraded from {{Gerrit|5360f9ad}} to {{Gerrit|e622a222}} * 02:13 eileen: civicrm upgraded from {{Gerrit|3283e3ca}} to {{Gerrit|e73c6b50}} == 2026-03-15 == * 23:56 eileen: civicrm upgraded from {{Gerrit|dce257f0}} to {{Gerrit|3283e3ca}} * 19:43 eileen: civicrm upgraded from {{Gerrit|a1279ee4}} to {{Gerrit|dce257f0}} == 2026-03-13 == * ish: payments-wiki upgraded from {{Gerrit|f40a1153}} to {{Gerrit|f3420a6f}} == 2026-03-11 == * 21:20 larssandergreen: civicrm upgraded from {{Gerrit|c2c716ca}} to {{Gerrit|a1279ee4}} * 19:15 eileen: civicrm upgraded from {{Gerrit|81baf495}} to {{Gerrit|c2c716ca}} * 07:18 eileen: config revision changed from {{Gerrit|ed2295ab}} to {{Gerrit|a7486f6a}} * 07:02 eileen: civicrm upgraded from {{Gerrit|f418297f}} to {{Gerrit|81baf495}} * 02:28 eileen: civicrm upgraded from {{Gerrit|14e8200e}} to {{Gerrit|da26f37d}} * 00:06 eileen: civicrm upgraded from {{Gerrit|fbb38eda}} to {{Gerrit|14e8200e}} == 2026-03-10 == * 22:08 eileen: civicrm upgraded from {{Gerrit|ef319ea3}} to {{Gerrit|fbb38eda}} * 19:35 eileen: civicrm upgraded from {{Gerrit|773d9fb9}} to {{Gerrit|ef319ea3}} * 06:06 eileen: config revision changed from {{Gerrit|b9bc2a20}} to {{Gerrit|60ef6709}} == 2026-03-09 == * 17:37 damilare: payments-wiki upgraded from {{Gerrit|5b747b97}} to {{Gerrit|f40a1153}} == 2026-03-06 == * 23:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|ed16a2ea}} to {{Gerrit|78a8e70a}} * 21:48 ejegg: fundraising civicrm upgraded from {{Gerrit|8aadcd81}} to {{Gerrit|773d9fb9}} * 21:13 ejegg: civicrm fundraising upgraded from {{Gerrit|a1f32ed6}} to {{Gerrit|8aadcd81}} * 20:35 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|217fc7fc}} to {{Gerrit|ed16a2ea}} * 19:18 ejegg: civicrm upgraded from {{Gerrit|f2633c89}} to {{Gerrit|a1f32ed6}} * 03:08 larssandergreen: civicrm upgraded from {{Gerrit|fbac3ce7}} to {{Gerrit|2bae36fa}} * 02:09 ejegg: payments-wiki upgraded from {{Gerrit|9ae5bf60}} to {{Gerrit|5b747b97}} == 2026-03-05 == * 17:53 ejegg: donorwiki upgraded from {{Gerrit|7329b41d}} to {{Gerrit|4c09db39}} * 05:59 eileen: ivicrm upgraded from {{Gerrit|11e5a5d8}} to {{Gerrit|fbac3ce7}} * 05:08 eileen: * civicrm upgraded from {{Gerrit|8bdce85f}} to {{Gerrit|11e5a5d8}} == 2026-03-04 == * 18:52 jgleeson: tools upgraded from {{Gerrit|a3568ffc}} to {{Gerrit|e60f63b3}} * 03:21 ejegg: payments-wiki upgraded from {{Gerrit|5e4939a3}} to {{Gerrit|9ae5bf60}} * 03:20 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|78960c68}} to {{Gerrit|217fc7fc}} == 2026-03-03 == * 20:42 eileen: civicrm upgraded from {{Gerrit|b610f844}} to {{Gerrit|8bdce85f}} * 17:09 dwisehaupt: latest php8.2 updates installed on civi1002 * 06:00 eileen: * civicrm upgraded from {{Gerrit|f4a70c82}} to {{Gerrit|b610f844}} == 2026-03-02 == * 20:40 eileen: cv upgraded from {{Gerrit|dfeedcbe}} to {{Gerrit|f19e0961}} * 18:41 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|8cd1593b}} to {{Gerrit|78960c68}} * 15:13 ejegg: donorwiki upgraded from {{Gerrit|f5d7179a}} to {{Gerrit|7329b41d}} == 2026-02-27 == * 19:00 ejegg: fundraising civicrm upgraded from {{Gerrit|162bbf7c}} to {{Gerrit|f4a70c82}} * 12:24 jgleeson: payments-wiki upgraded from {{Gerrit|f4fd71ff}} to {{Gerrit|a8a9ce78}} * 11:25 jgleeson: payments-wiki upgraded from {{Gerrit|974af222}} to {{Gerrit|f4fd71ff}} == 2026-02-26 == * 22:36 eileen: civicrm upgraded from {{Gerrit|6995dc03}} to {{Gerrit|162bbf7c}} * 21:52 ejegg: civicrm upgraded from {{Gerrit|2a6001d3}} to {{Gerrit|6995dc03}} * 17:44 larssandergreen: civicrm upgraded from {{Gerrit|3ac93e95}} to {{Gerrit|2a6001d3}} * 07:14 eileen: civicrm upgraded from {{Gerrit|c3157fbf}} to {{Gerrit|3ac93e95}} == 2026-02-25 == * 18:33 ejegg: fundraising civicrm upgraded from {{Gerrit|f881e026}} to {{Gerrit|c3157fbf}} * 03:37 eileen: config revision changed from {{Gerrit|390d6434}} to {{Gerrit|a0228e6c}} turn off trustly audit == 2026-02-24 == * 22:48 ejegg: payments-wiki upgraded from {{Gerrit|e5f73610}} to {{Gerrit|974af222}} * 19:47 ejegg: payments-wiki upgraded from {{Gerrit|f5d7179a}} to {{Gerrit|e5f73610}} * 02:06 eileen: config revision changed from {{Gerrit|71c98072}} to {{Gerrit|390d6434}} reenabled trustly audit == 2026-02-23 == * 23:42 eileen: civicrm upgraded from {{Gerrit|f0710864}} to {{Gerrit|f881e026}} * 22:28 ejegg: fundraising civicrm upgraded from {{Gerrit|9d58ce4a}} to {{Gerrit|f0710864}} * 16:43 jgleeson: payments-wiki upgraded from {{Gerrit|0127f2d8}} to {{Gerrit|f5d7179a}} == 2026-02-21 == * 01:15 dwisehaupt: updating localsettings from {{Gerrit|71c98072}} to {{Gerrit|534fbf34}} and syncing civicrm to push update for large_donation_notifications == 2026-02-20 == * 20:38 ejegg: donorwiki upgraded from {{Gerrit|f7a0ee6b}} to {{Gerrit|f5d7179a}} == 2026-02-19 == * 21:26 ejegg: payments-wiki upgraded from {{Gerrit|f7a0ee6b}} to {{Gerrit|0127f2d8}} * 06:37 eileen: civicrm upgraded from {{Gerrit|ac30e19f}} to {{Gerrit|9d58ce4a}} == 2026-02-18 == * 22:21 dwisehaupt: disabling apache2::mod::dump_io on civicrm role for debugging 500 errors after testing. can be re-enabled by reverting commit {{Gerrit|4b1d94399}} - [[phab:T417310|T417310]] * 20:38 eileen: civicrm upgraded from {{Gerrit|f5020a85}} to {{Gerrit|ac30e19f}} * 19:22 eileen: civicrm upgraded from {{Gerrit|66d2e1dd}} to {{Gerrit|f5020a85}} * 17:48 ejegg: donorwiki upgraded from {{Gerrit|488431ec}} to {{Gerrit|f7a0ee6b}} * 16:51 dwisehaupt: enabling apache2::mod::dump_io on civicrm role for debugging 500 errors - [[phab:T417310|T417310]] * 06:02 eileen: civicrm upgraded from {{Gerrit|4c3cdcde}} to {{Gerrit|66d2e1dd}} * 05:33 eileen: config revision changed from {{Gerrit|605c6946}} to {{Gerrit|368156fa}} * 05:21 eileen: civicrm upgraded from {{Gerrit|caad5ab9}} to {{Gerrit|4c3cdcde}} == 2026-02-17 == * 17:34 larssandergreen: payments-wiki upgraded from {{Gerrit|c506d590}} to {{Gerrit|488431ec}} * 17:33 larssandergreen: donorwiki upgraded from {{Gerrit|93e0d03f}} to {{Gerrit|488431ec}} * 03:02 eileen: civicrm upgraded from {{Gerrit|89782fc6}} to {{Gerrit|caad5ab9}} == 2026-02-16 == * 20:16 eileen: civicrm upgraded from {{Gerrit|2b227403}} to {{Gerrit|89782fc6}} == 2026-02-15 == * 23:55 eileen: civicrm upgraded from {{Gerrit|de8252c7}} to {{Gerrit|2b227403}} * 21:07 eileen: civicrm upgraded from {{Gerrit|038f5bca}} to {{Gerrit|de8252c7}} == 2026-02-13 == * 16:03 ejegg: payments-wiki upgraded from {{Gerrit|5793a405}} to {{Gerrit|c506d590}} * away: SmashPig upgraded from {{Gerrit|fea03fcc}} to {{Gerrit|8cd1593b}} == 2026-02-12 == * 22:49 eileen: civicrm upgraded from {{Gerrit|c6c0d453}} to {{Gerrit|038f5bca}} * 21:45 jgleeson: payments-wiki upgraded from {{Gerrit|9dbf0ece}} to {{Gerrit|5793a405}} * 21:15 jgleeson: payments-wiki upgraded from {{Gerrit|6c1a522f}} to {{Gerrit|9dbf0ece}} * 19:03 larssandergreen: tools upgraded from {{Gerrit|645cf5dc}} to {{Gerrit|a3568ffc}} * 00:34 larssandergreen: civicrm upgraded from {{Gerrit|e13111f3}} to {{Gerrit|c6c0d453}} == 2026-02-11 == * 21:12 eileen: civicrm upgraded from {{Gerrit|6e57071a}} to {{Gerrit|e13111f3}} * 06:27 eileen: civicrm upgraded from {{Gerrit|98c325dd}} to {{Gerrit|6e57071a}} * 04:55 larssandergreen: tools upgraded from {{Gerrit|7462b8bd}} to {{Gerrit|645cf5dc}} * 03:37 eileen: civicrm upgraded from {{Gerrit|953cf9f2}} to {{Gerrit|98c325dd}} == 2026-02-09 == * 15:35 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|937d6e40}} to {{Gerrit|fea03fcc}} == 2026-02-08 == * 23:47 eileen: civicrm upgraded from {{Gerrit|bc3a8036}} to {{Gerrit|953cf9f2}} * 20:46 eileen: civicrm upgraded from {{Gerrit|40189afa}} to {{Gerrit|bc3a8036}} == 2026-02-06 == * 15:41 ejegg: fundraising civicrm upgraded from {{Gerrit|06842fbf}} to {{Gerrit|40189afa}} * 06:24 eileen: civicrm upgraded from {{Gerrit|b63d7146}} to {{Gerrit|06842fbf}} == 2026-02-05 == * 15:25 dwisehaupt: all eqiad hosts powered down and ready for relocation. * 15:03 dwisehaupt: starting poweroff of eqiad hosts * 14:48 dwisehaupt: downtimes scheduled for frack eqiad hosts and cross colo replication in prep for rack expansion - [[phab:T403035|T403035]] * 04:30 eileen: civicrm upgraded from {{Gerrit|000dd548}} to {{Gerrit|b63d7146}} == 2026-02-04 == * 23:56 eileen: civicrm upgraded from {{Gerrit|4c2870c5}} to {{Gerrit|000dd548}} * 23:47 cstone: donorwiki upgraded from {{Gerrit|53bfb05b}} to {{Gerrit|93e0d03f}} * 21:51 ejegg: payments-wiki upgraded from {{Gerrit|a09a4f8f}} to {{Gerrit|93e0d03f}} * 21:46 eileen: civicrm upgraded from {{Gerrit|dd10342d}} to {{Gerrit|4c2870c5}} * 21:22 eileen: civicrm upgraded from {{Gerrit|8aa9274a}} to {{Gerrit|dd10342d}} * 03:48 eileen: * civicrm upgraded from {{Gerrit|14c7b7e7}} to {{Gerrit|8aa9274a}} == 2026-02-03 == * 21:55 dwisehaupt: as part of dns cleanup, we have removed the old civi1002.wikimedia.org entry. folks should be using civicrm.wm.o but there is a chance of super old bookmarks still being around. we can reinstate if it becomes an issue. * 21:05 eileen: config revision changed from {{Gerrit|23b2d9b6}} to {{Gerrit|45d40cf1}} * 21:01 larssandergreen: civicrm upgraded from {{Gerrit|10ab3659}} to {{Gerrit|14c7b7e7}} * 05:49 eileen: config revision changed from {{Gerrit|f348441d}} to {{Gerrit|23b2d9b6}} * 05:48 eileen: civicrm upgraded from {{Gerrit|a097bb3d}} to {{Gerrit|10ab3659}} * 01:16 wfan: civicrm upgraded from {{Gerrit|5b9f3cd4}} to {{Gerrit|a097bb3d}} * 00:24 wfan: donorwiki upgraded from {{Gerrit|3ffc70f0}} to {{Gerrit|53bfb05b}} == 2026-02-02 == * 23:50 wfan: payments-wiki upgraded from {{Gerrit|c035aa84}} to {{Gerrit|53bfb05b}} * 20:22 cstone: civicrm upgraded from {{Gerrit|f91f955b}} to {{Gerrit|5b9f3cd4}} * 18:55 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|b42f2de4}} to {{Gerrit|937d6e40}} * 17:01 wfan: payments-wiki upgraded from {{Gerrit|c5cadd72}} to {{Gerrit|c035aa84}} * 15:26 ejegg: fundraising civicrm upgraded from {{Gerrit|611d18de}} to {{Gerrit|f91f955b}} == 2026-01-30 == * 17:02 larssandergreen: civicrm upgraded from {{Gerrit|79e4424e}} to {{Gerrit|611d18de}} * 04:18 cstone: civicrm upgraded from {{Gerrit|fe1af57a}} to {{Gerrit|79e4424e}} * 02:51 eileen: civicrm upgraded from {{Gerrit|ff772bee}} to {{Gerrit|fe1af57a}} * 02:28 eileen: civicrm upgraded from {{Gerrit|bcf976ae}} to {{Gerrit|ff772bee}} == 2026-01-29 == * 22:16 eileen: civicrm upgraded from {{Gerrit|5d121c63}} to {{Gerrit|bcf976ae}} * 20:17 eileen: civicrm upgraded from {{Gerrit|ebcfd009}} to {{Gerrit|5d121c63}} * 20:09 eileen: civicrm upgraded from {{Gerrit|5c065c4e}} to {{Gerrit|ebcfd009}} * 18:08 wfan: payments-wiki upgraded from {{Gerrit|81d9f614}} to {{Gerrit|c5cadd72}} == 2026-01-28 == * 19:36 jgleeson: SmashPig upgraded from {{Gerrit|96a6224d}} to {{Gerrit|b42f2de4}} * 18:53 jgleeson: civicrm upgraded from {{Gerrit|56c222da}} to {{Gerrit|5c065c4e}} * 18:53 jgleeson: SmashPig upgraded from {{Gerrit|96a6224d}} to {{Gerrit|b42f2de4}} * 14:41 jgleeson: payments-wiki upgraded from {{Gerrit|24915bdb}} to {{Gerrit|81d9f614}} * 07:30 eileen: civicrm upgraded from {{Gerrit|600b21a6}} to {{Gerrit|56c222da}} * 06:51 eileen: * civicrm upgraded from {{Gerrit|32f9a10d}} to {{Gerrit|600b21a6}} * 04:35 eileen: config revision changed from {{Gerrit|ed0808a9}} to {{Gerrit|ef6ef5f2}} * 03:42 eileen: civicrm upgraded from {{Gerrit|64267a34}} to {{Gerrit|32f9a10d}} * 02:13 eileen: civicrm upgraded from {{Gerrit|7299615a}} to {{Gerrit|64267a34}} == 2026-01-27 == * 21:12 larssandergreen: tools upgraded from {{Gerrit|84323460}} to {{Gerrit|7462b8bd}} * 05:57 cstone: civicrm upgraded from {{Gerrit|19f94835}} to {{Gerrit|75f443b5}} == 2026-01-26 == * 17:15 damilare: smashpig upgraded from {{Gerrit|8b4ebf34}} to {{Gerrit|96a6224d}} * 16:32 larssandergreen: tools upgraded from {{Gerrit|c75f7625}} to {{Gerrit|84323460}} * 01:43 eileen: config revision changed from {{Gerrit|2f71107f}} to {{Gerrit|ed0808a9}} switch to php dlocal downloader (now weekend is mostly over) - == 2026-01-25 == * 21:47 eileen: config revision changed from {{Gerrit|23023984}} to {{Gerrit|2f71107f}} == 2026-01-24 == * 02:19 cstone: civicrm upgraded from {{Gerrit|f7064a46}} to {{Gerrit|19f94835}} * 00:55 bd808: Testing #wikimedia-fundraising SAL integration ([[phab:T415389|T415389]]) <noinclude>[[Category:SAL]]</noinclude> 8tixzzzxq2avyq6gw3pjuz73rex882j 2414275 2414266 2026-05-15T15:59:05Z Stashbot 7414 ejegg: donorwiki upgraded from 26f5451a to 1a056dc0 2414275 wikitext text/x-wiki == 2026-05-15 == * 15:59 ejegg: donorwiki upgraded from {{Gerrit|26f5451a}} to {{Gerrit|1a056dc0}} * 15:35 ejegg: payments-wiki upgraded from {{Gerrit|cf9ec80b}} to {{Gerrit|1a056dc0}} * 00:02 eileen: civicrm upgraded from {{Gerrit|6d8ce7a3}} to {{Gerrit|6a2258ff}} == 2026-05-14 == * 22:22 ejegg: fundraising scheduled jobs re-enabled * 22:12 eileen: cv upgraded from {{Gerrit|f19e0961}} to {{Gerrit|b8a8dd6a}} * 22:11 ejegg: fundraising civicrm upgraded from {{Gerrit|e25fa223}} to {{Gerrit|6d8ce7a3}} * 22:09 ejegg: fundraising scheduled jobs disabled for Civi update * 19:29 ejegg: re-enabled fundraising scheduled jobs * 19:06 ejegg: fundraising civicrm upgraded from {{Gerrit|950908ec}} to {{Gerrit|e25fa223}} * 19:04 ejegg: disabled fundraising scheduled jobs for CiviCRM deployment * 16:48 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|bb833986}} to {{Gerrit|a2a3a015}} == 2026-05-12 == * 16:07 ejegg: fundraising civicrm upgraded from {{Gerrit|24ac90e7}} to {{Gerrit|60dcc28f}} == 2026-05-06 == * 12:29 eileen: config revision changed from {{Gerrit|41cfd677}} to {{Gerrit|00752f91}} * 10:56 eileen: SmashPig upgraded from {{Gerrit|4201ef56}} to {{Gerrit|bb833986}} * 09:59 eileen: civicrm upgraded from {{Gerrit|4d9c8600}} to {{Gerrit|24ac90e7}} * 09:15 eileen: civicrm upgraded from {{Gerrit|38dcf7a8}} to {{Gerrit|4d9c8600}} == 2026-05-04 == * 17:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|1be60746}} to {{Gerrit|4201ef56}} == 2026-05-03 == * hackathon: civicrm upgraded from {{Gerrit|0afbc8ea}} to {{Gerrit|38dcf7a8}} == 2026-05-02 == * 09:44 eileen: civicrm upgraded from {{Gerrit|7556c5c7}} to {{Gerrit|0afbc8ea}} * 09:43 eileen: SmashPig upgraded from {{Gerrit|88a1bcba}} to {{Gerrit|1be60746}} == 2026-05-01 == * 16:36 eileen: civicrm upgraded from {{Gerrit|1a835879}} to {{Gerrit|7556c5c7}} * 15:08 eileen: civicrm upgraded from {{Gerrit|9ed32632}} to {{Gerrit|1a835879}} * 14:02 eileen: civicrm upgraded from {{Gerrit|081d5a29}} to {{Gerrit|9ed32632}} == 2026-04-29 == * 13:51 jgleeson: payments-wiki upgraded from {{Gerrit|2e2eb8a2}} to {{Gerrit|4e0c944b}} * 13:49 jgleeson: tools upgraded from {{Gerrit|f52a5dcf}} to {{Gerrit|afbd0f67}} == 2026-04-28 == * 20:58 larssandergreen: civicrm upgraded from {{Gerrit|be3bb76b}} to {{Gerrit|081d5a29}} == 2026-04-27 == * 19:04 ejegg: fundraising civicrm upgraded from {{Gerrit|3f8d49fa}} to {{Gerrit|be3bb76b}} * 19:02 ejegg: payments-wiki upgraded from {{Gerrit|b1a352af}} to {{Gerrit|5265089d}} * 18:58 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|572b69da}} to {{Gerrit|88a1bcba}} == 2026-04-23 == * 20:40 ejegg: payments-wiki upgraded from {{Gerrit|6e78ef91}} to {{Gerrit|b1a352af}} * 19:52 ejegg: civicrm fundraising upgraded from {{Gerrit|5ea4c8d3}} to {{Gerrit|3f8d49fa}} * 19:27 ejegg: SmashPig upgraded from {{Gerrit|f1b3f3d9}} to {{Gerrit|572b69da}} * 16:32 larssandergreen: civicrm upgraded from {{Gerrit|53a0b46f}} to {{Gerrit|5ea4c8d3}} * 16:31 larssandergreen: tools upgraded from {{Gerrit|edca3f63}} to {{Gerrit|f52a5dcf}} == 2026-04-22 == * 02:10 eileen: civicrm upgraded from {{Gerrit|abd23ad7}} to {{Gerrit|53a0b46f}} == 2026-04-21 == * 23:16 cstone: civicrm upgraded from {{Gerrit|22f24ae4}} to {{Gerrit|abd23ad7}} * 19:46 larssandergreen: civicrm upgraded from {{Gerrit|ddc1f044}} to {{Gerrit|22f24ae4}} == 2026-04-20 == * 19:21 larssandergreen: tools upgraded from {{Gerrit|26ab0125}} to {{Gerrit|edca3f63}} * 15:09 ejegg: payments-wiki upgraded from {{Gerrit|86a42498}} to {{Gerrit|6e78ef91}} == 2026-04-17 == * 01:08 larssandergreen: civicrm upgraded from {{Gerrit|90c0ccd9}} to {{Gerrit|ddc1f044}} == 2026-04-16 == * 16:39 larssandergreen: tools upgraded from {{Gerrit|f14a814e}} to {{Gerrit|26ab0125}} * 14:20 larssandergreen: civicrm upgraded from {{Gerrit|801847a7}} to {{Gerrit|90c0ccd9}} * 14:19 larssandergreen: tools upgraded from {{Gerrit|9bff5f07}} to {{Gerrit|f14a814e}} * 02:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|61fee241}} to {{Gerrit|f1b3f3d9}} == 2026-04-15 == * 21:50 eileen: civicrm upgraded from {{Gerrit|6f33b6d0}} to {{Gerrit|801847a7}} * 06:14 eileen: civicrm upgraded from {{Gerrit|a047bf92}} to {{Gerrit|6f33b6d0}} * 01:20 eileen: SmashPig upgraded from {{Gerrit|100101fb}} to {{Gerrit|61fee241}} == 2026-04-14 == * 23:58 eileen: civicrm upgraded from {{Gerrit|eb3d73e4}} to {{Gerrit|a047bf92}} * 23:48 wfan: payments-wiki upgraded from {{Gerrit|26f5451a}} to {{Gerrit|c3b34f99}} * 22:31 eileen: civicrm upgraded from {{Gerrit|2058927e}} to {{Gerrit|eb3d73e4}} * 22:25 eileen: civicrm upgraded from {{Gerrit|2058927e}} to {{Gerrit|eb3d73e4}} * 18:06 ejegg: fundraising civicrm upgraded from {{Gerrit|fccf9b3a}} to {{Gerrit|2058927e}} == 2026-04-13 == * 18:20 ejegg: fundraising civicrm upgraded from {{Gerrit|fa20eb0a}} to {{Gerrit|fccf9b3a}} * 17:04 ejegg: re-enabled recurring donation charge jobs * 16:29 ejegg: fundraising civicrm upgraded from {{Gerrit|eb188fa2}} to {{Gerrit|fa20eb0a}} * 16:27 ejegg: disabled recurring donation charge jobs for code / settings update * 12:44 jgleeson: donorwiki upgraded from {{Gerrit|064a770e}} to {{Gerrit|26f5451a}} == 2026-04-10 == * 15:35 jgleeson: payments-wiki upgraded from {{Gerrit|c017d7e7}} to {{Gerrit|dd45f867}} == 2026-04-09 == * 23:31 wfan: payments-wiki upgraded from {{Gerrit|064a770e}} to {{Gerrit|c017d7e7}} * 21:49 ejegg: fundraising civicrm upgraded from {{Gerrit|3d3c0a62}} to {{Gerrit|eb188fa2}} * 19:00 larssandergreen: tools upgraded from {{Gerrit|986f7f83}} to {{Gerrit|9bff5f07}} * 13:08 jgleeson: civicrm upgraded from {{Gerrit|d8d3871c}} to {{Gerrit|3d3c0a62}} * 11:53 jgleeson: SmashPig upgraded from {{Gerrit|5c083891}} to {{Gerrit|100101fb}} * 01:20 ejegg: fundraising civicrm upgraded from {{Gerrit|e60321bb}} to {{Gerrit|d8d3871c}} == 2026-04-08 == * 17:44 ejegg: fundraising civicrm upgraded from {{Gerrit|4ee0b5e8}} to {{Gerrit|e60321bb}} * 15:12 ejegg: payments-wiki upgraded from {{Gerrit|1ad85e6c}} to {{Gerrit|064a770e}} * 01:19 ejegg: donorwiki upgraded from {{Gerrit|1ad85e6c}} to {{Gerrit|064a770e}} * 00:26 dwisehaupt: cloning new frdb frdb1008 from frdb2005 == 2026-04-07 == * 18:34 wfan: civicrm upgraded from {{Gerrit|9104e70b}} to {{Gerrit|6f762e29}} == 2026-04-06 == * 20:42 ejegg: re-enabled recurring donation charge job * 20:33 wfan: donorwiki upgraded from {{Gerrit|c2d03117}} to {{Gerrit|1ad85e6c}} * 20:32 wfan: payments-wiki upgraded from {{Gerrit|80cda166}} to {{Gerrit|1ad85e6c}} * 16:53 ejegg: disabled recurring donations charge job while diagnosing gr4vy routing errors * 16:03 ejegg: civicrm upgraded from {{Gerrit|4ee11209}} to {{Gerrit|9104e70b}} == 2026-04-03 == * 00:04 wfan: civicrm upgraded from {{Gerrit|49f541cd}} to {{Gerrit|4ee11209}} == 2026-04-02 == * 21:38 cstone: payments-wiki upgraded from {{Gerrit|86bec442}} to {{Gerrit|80cda166}} * 05:24 eileen: civicrm upgraded from {{Gerrit|c512abc6}} to {{Gerrit|49f541cd}} * 02:39 eileen: civicrm upgraded from {{Gerrit|bbed1291}} to {{Gerrit|c512abc6}} * 02:16 eileen: SmashPig upgraded from {{Gerrit|9af71a7c}} to {{Gerrit|18ea746a}} == 2026-04-01 == * 18:57 eileen: civicrm upgraded from {{Gerrit|a1bf4768}} to {{Gerrit|bbed1291}} * 04:11 eileen: civicrm upgraded from {{Gerrit|11a2f9ab}} to {{Gerrit|a1bf4768}} * 03:18 ejegg: payments-wiki upgraded from {{Gerrit|02bf54b0}} to {{Gerrit|86bec442}} == 2026-03-31 == * 22:03 jgleeson: tools upgraded from {{Gerrit|9985e723}} to {{Gerrit|986f7f83}} * 20:16 eileen: civicrm upgraded from {{Gerrit|c3cc3562}} to {{Gerrit|b468301c}} * 18:40 jgleeson: tools upgraded from {{Gerrit|161049ac}} to {{Gerrit|9985e723}} * 17:39 ejegg: Standalone (IPN listener) SmashPig upgraded from {{Gerrit|abf8682a}} to {{Gerrit|9af71a7c}} * 16:38 jgleeson: tools upgraded from {{Gerrit|f605b570}} to {{Gerrit|161049ac}} * 16:24 jgleeson: donorwiki updated from {{Gerrit|d79a98b5}} to {{Gerrit|c2d03117}} * 02:46 eileen: civicrm upgraded from {{Gerrit|591bef29}} to {{Gerrit|c3cc3562}} * 01:00 eileen: civicrm upgraded from {{Gerrit|cf871dd3}} to {{Gerrit|591bef29}} == 2026-03-30 == * 23:19 eileen: civicrm upgraded from {{Gerrit|7d299b48}} to {{Gerrit|cf871dd3}} * 21:09 eileen: civicrm upgraded from {{Gerrit|3724cc2d}} to {{Gerrit|7d299b48}} * 20:40 eileen: civicrm upgraded from {{Gerrit|58426b1e}} to {{Gerrit|3724cc2d}} * 14:51 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|545f0b10}} to {{Gerrit|abf8682a}} == 2026-03-28 == * 02:57 ejegg: payments-wiki upgraded from {{Gerrit|d79a98b5}} to {{Gerrit|b239a6b7}} * 02:27 eileen: civicrm upgraded from {{Gerrit|7138d524}} to {{Gerrit|58426b1e}} == 2026-03-27 == * 22:37 eileen: civicrm upgraded from {{Gerrit|c51e98cc}} to {{Gerrit|7138d524}} * 17:50 dwisehaupt: Corrected value: on frdb1005 running the following in mysql to up the buffer pool to ~412G: set global innodb_buffer_pool_size = 431669379072; * 17:34 dwisehaupt: on frdb1005 running the following in mysql to up the buffer pool to ~412G: set global innodb_buffer_pool_size = 421552128; == 2026-03-26 == * 20:00 jgleeson: donorwiki upgraded from {{Gerrit|48cc2e9d}} to {{Gerrit|d79a98b5}} * 19:59 jgleeson: payments-wiki upgraded from {{Gerrit|b387c6ba}} to {{Gerrit|d79a98b5}} * 19:48 ejegg: fundraising civicrm upgraded from {{Gerrit|88b497b8}} to {{Gerrit|db102c77}} * 19:25 larssandergreen: civicrm upgraded from {{Gerrit|26dab2f0}} to {{Gerrit|88b497b8}} * 16:26 jgleeson: SmashPig upgraded from {{Gerrit|5d8a0330}} to {{Gerrit|545f0b10}} * 02:13 eileen: civicrm upgraded from {{Gerrit|a752f9e8}} to {{Gerrit|26dab2f0}} * 00:05 eileen: civicrm upgraded from {{Gerrit|277fe75e}} to {{Gerrit|a752f9e8}} == 2026-03-25 == * 21:53 eileen: civicrm upgraded from {{Gerrit|97319295}} to {{Gerrit|277fe75e}} * 19:58 eileen: civicrm upgraded from {{Gerrit|d3dfedb4}} to {{Gerrit|97319295}} == 2026-03-24 == * 19:32 wfan: payments-wiki upgraded from {{Gerrit|fce3fab5}} to {{Gerrit|b387c6ba}} * 04:24 eileen: civicrm upgraded from {{Gerrit|0aef661f}} to {{Gerrit|b2ed875e}} * 03:38 eileen: config revision changed from {{Gerrit|ded0c289}} to {{Gerrit|16592428}} schedule stripe download * 03:33 eileen: config revision changed from {{Gerrit|79e052e4}} to {{Gerrit|ded0c289}} temporarily disable adyen audit parse - let's fix those misplaced IDs * 01:55 eileen: civicrm upgraded from {{Gerrit|b2c7f1d0}} to {{Gerrit|0aef661f}} * 00:08 eileen: civicrm upgraded from {{Gerrit|80344f51}} to {{Gerrit|b2c7f1d0}} == 2026-03-23 == * 21:30 eileen: config revision changed from {{Gerrit|8c5587f3}} to {{Gerrit|2dd50e7c}} * 18:55 wfan: civicrm upgraded from {{Gerrit|675455b2}} to {{Gerrit|80344f51}} * 17:27 larssandergreen: tools upgraded from {{Gerrit|e60f63b3}} to {{Gerrit|f605b570}} * 15:59 ejegg: civicrm upgraded from {{Gerrit|a2d4b17c}} to {{Gerrit|675455b2}} * 12:25 jgleeson: payments-wiki upgraded from {{Gerrit|48cc2e9d}} to {{Gerrit|91d9eee9}} == 2026-03-20 == * 00:34 eileen: * civicrm upgraded from {{Gerrit|adc36173}} to {{Gerrit|a2d4b17c}} * 00:31 cstone: payments-wiki upgraded from {{Gerrit|f3420a6f}} to {{Gerrit|48cc2e9d}} * 00:27 eileen: config revision changed from {{Gerrit|a7486f6a}} to {{Gerrit|a1a426f3}} * 00:27 eileen: SmashPig upgraded from {{Gerrit|78a8e70a}} to {{Gerrit|5d8a0330}} == 2026-03-19 == * 14:08 damilare: civiproxy upgraded from {{Gerrit|6625c844}} to {{Gerrit|38ba8348}} == 2026-03-17 == * 18:40 jgleeson: donorwiki upgraded from {{Gerrit|4c09db39}} to {{Gerrit|7d1666f9}} * 06:20 eileen: civicrm upgraded from {{Gerrit|7fe14629}} to {{Gerrit|adc36173}} * 05:06 eileen: civicrm upgraded from {{Gerrit|e622a222}} to {{Gerrit|7fe14629}} * 03:48 eileen: civicrm upgraded from {{Gerrit|5360f9ad}} to {{Gerrit|e622a222}} * 02:13 eileen: civicrm upgraded from {{Gerrit|3283e3ca}} to {{Gerrit|e73c6b50}} == 2026-03-15 == * 23:56 eileen: civicrm upgraded from {{Gerrit|dce257f0}} to {{Gerrit|3283e3ca}} * 19:43 eileen: civicrm upgraded from {{Gerrit|a1279ee4}} to {{Gerrit|dce257f0}} == 2026-03-13 == * ish: payments-wiki upgraded from {{Gerrit|f40a1153}} to {{Gerrit|f3420a6f}} == 2026-03-11 == * 21:20 larssandergreen: civicrm upgraded from {{Gerrit|c2c716ca}} to {{Gerrit|a1279ee4}} * 19:15 eileen: civicrm upgraded from {{Gerrit|81baf495}} to {{Gerrit|c2c716ca}} * 07:18 eileen: config revision changed from {{Gerrit|ed2295ab}} to {{Gerrit|a7486f6a}} * 07:02 eileen: civicrm upgraded from {{Gerrit|f418297f}} to {{Gerrit|81baf495}} * 02:28 eileen: civicrm upgraded from {{Gerrit|14e8200e}} to {{Gerrit|da26f37d}} * 00:06 eileen: civicrm upgraded from {{Gerrit|fbb38eda}} to {{Gerrit|14e8200e}} == 2026-03-10 == * 22:08 eileen: civicrm upgraded from {{Gerrit|ef319ea3}} to {{Gerrit|fbb38eda}} * 19:35 eileen: civicrm upgraded from {{Gerrit|773d9fb9}} to {{Gerrit|ef319ea3}} * 06:06 eileen: config revision changed from {{Gerrit|b9bc2a20}} to {{Gerrit|60ef6709}} == 2026-03-09 == * 17:37 damilare: payments-wiki upgraded from {{Gerrit|5b747b97}} to {{Gerrit|f40a1153}} == 2026-03-06 == * 23:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|ed16a2ea}} to {{Gerrit|78a8e70a}} * 21:48 ejegg: fundraising civicrm upgraded from {{Gerrit|8aadcd81}} to {{Gerrit|773d9fb9}} * 21:13 ejegg: civicrm fundraising upgraded from {{Gerrit|a1f32ed6}} to {{Gerrit|8aadcd81}} * 20:35 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|217fc7fc}} to {{Gerrit|ed16a2ea}} * 19:18 ejegg: civicrm upgraded from {{Gerrit|f2633c89}} to {{Gerrit|a1f32ed6}} * 03:08 larssandergreen: civicrm upgraded from {{Gerrit|fbac3ce7}} to {{Gerrit|2bae36fa}} * 02:09 ejegg: payments-wiki upgraded from {{Gerrit|9ae5bf60}} to {{Gerrit|5b747b97}} == 2026-03-05 == * 17:53 ejegg: donorwiki upgraded from {{Gerrit|7329b41d}} to {{Gerrit|4c09db39}} * 05:59 eileen: ivicrm upgraded from {{Gerrit|11e5a5d8}} to {{Gerrit|fbac3ce7}} * 05:08 eileen: * civicrm upgraded from {{Gerrit|8bdce85f}} to {{Gerrit|11e5a5d8}} == 2026-03-04 == * 18:52 jgleeson: tools upgraded from {{Gerrit|a3568ffc}} to {{Gerrit|e60f63b3}} * 03:21 ejegg: payments-wiki upgraded from {{Gerrit|5e4939a3}} to {{Gerrit|9ae5bf60}} * 03:20 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|78960c68}} to {{Gerrit|217fc7fc}} == 2026-03-03 == * 20:42 eileen: civicrm upgraded from {{Gerrit|b610f844}} to {{Gerrit|8bdce85f}} * 17:09 dwisehaupt: latest php8.2 updates installed on civi1002 * 06:00 eileen: * civicrm upgraded from {{Gerrit|f4a70c82}} to {{Gerrit|b610f844}} == 2026-03-02 == * 20:40 eileen: cv upgraded from {{Gerrit|dfeedcbe}} to {{Gerrit|f19e0961}} * 18:41 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|8cd1593b}} to {{Gerrit|78960c68}} * 15:13 ejegg: donorwiki upgraded from {{Gerrit|f5d7179a}} to {{Gerrit|7329b41d}} == 2026-02-27 == * 19:00 ejegg: fundraising civicrm upgraded from {{Gerrit|162bbf7c}} to {{Gerrit|f4a70c82}} * 12:24 jgleeson: payments-wiki upgraded from {{Gerrit|f4fd71ff}} to {{Gerrit|a8a9ce78}} * 11:25 jgleeson: payments-wiki upgraded from {{Gerrit|974af222}} to {{Gerrit|f4fd71ff}} == 2026-02-26 == * 22:36 eileen: civicrm upgraded from {{Gerrit|6995dc03}} to {{Gerrit|162bbf7c}} * 21:52 ejegg: civicrm upgraded from {{Gerrit|2a6001d3}} to {{Gerrit|6995dc03}} * 17:44 larssandergreen: civicrm upgraded from {{Gerrit|3ac93e95}} to {{Gerrit|2a6001d3}} * 07:14 eileen: civicrm upgraded from {{Gerrit|c3157fbf}} to {{Gerrit|3ac93e95}} == 2026-02-25 == * 18:33 ejegg: fundraising civicrm upgraded from {{Gerrit|f881e026}} to {{Gerrit|c3157fbf}} * 03:37 eileen: config revision changed from {{Gerrit|390d6434}} to {{Gerrit|a0228e6c}} turn off trustly audit == 2026-02-24 == * 22:48 ejegg: payments-wiki upgraded from {{Gerrit|e5f73610}} to {{Gerrit|974af222}} * 19:47 ejegg: payments-wiki upgraded from {{Gerrit|f5d7179a}} to {{Gerrit|e5f73610}} * 02:06 eileen: config revision changed from {{Gerrit|71c98072}} to {{Gerrit|390d6434}} reenabled trustly audit == 2026-02-23 == * 23:42 eileen: civicrm upgraded from {{Gerrit|f0710864}} to {{Gerrit|f881e026}} * 22:28 ejegg: fundraising civicrm upgraded from {{Gerrit|9d58ce4a}} to {{Gerrit|f0710864}} * 16:43 jgleeson: payments-wiki upgraded from {{Gerrit|0127f2d8}} to {{Gerrit|f5d7179a}} == 2026-02-21 == * 01:15 dwisehaupt: updating localsettings from {{Gerrit|71c98072}} to {{Gerrit|534fbf34}} and syncing civicrm to push update for large_donation_notifications == 2026-02-20 == * 20:38 ejegg: donorwiki upgraded from {{Gerrit|f7a0ee6b}} to {{Gerrit|f5d7179a}} == 2026-02-19 == * 21:26 ejegg: payments-wiki upgraded from {{Gerrit|f7a0ee6b}} to {{Gerrit|0127f2d8}} * 06:37 eileen: civicrm upgraded from {{Gerrit|ac30e19f}} to {{Gerrit|9d58ce4a}} == 2026-02-18 == * 22:21 dwisehaupt: disabling apache2::mod::dump_io on civicrm role for debugging 500 errors after testing. can be re-enabled by reverting commit {{Gerrit|4b1d94399}} - [[phab:T417310|T417310]] * 20:38 eileen: civicrm upgraded from {{Gerrit|f5020a85}} to {{Gerrit|ac30e19f}} * 19:22 eileen: civicrm upgraded from {{Gerrit|66d2e1dd}} to {{Gerrit|f5020a85}} * 17:48 ejegg: donorwiki upgraded from {{Gerrit|488431ec}} to {{Gerrit|f7a0ee6b}} * 16:51 dwisehaupt: enabling apache2::mod::dump_io on civicrm role for debugging 500 errors - [[phab:T417310|T417310]] * 06:02 eileen: civicrm upgraded from {{Gerrit|4c3cdcde}} to {{Gerrit|66d2e1dd}} * 05:33 eileen: config revision changed from {{Gerrit|605c6946}} to {{Gerrit|368156fa}} * 05:21 eileen: civicrm upgraded from {{Gerrit|caad5ab9}} to {{Gerrit|4c3cdcde}} == 2026-02-17 == * 17:34 larssandergreen: payments-wiki upgraded from {{Gerrit|c506d590}} to {{Gerrit|488431ec}} * 17:33 larssandergreen: donorwiki upgraded from {{Gerrit|93e0d03f}} to {{Gerrit|488431ec}} * 03:02 eileen: civicrm upgraded from {{Gerrit|89782fc6}} to {{Gerrit|caad5ab9}} == 2026-02-16 == * 20:16 eileen: civicrm upgraded from {{Gerrit|2b227403}} to {{Gerrit|89782fc6}} == 2026-02-15 == * 23:55 eileen: civicrm upgraded from {{Gerrit|de8252c7}} to {{Gerrit|2b227403}} * 21:07 eileen: civicrm upgraded from {{Gerrit|038f5bca}} to {{Gerrit|de8252c7}} == 2026-02-13 == * 16:03 ejegg: payments-wiki upgraded from {{Gerrit|5793a405}} to {{Gerrit|c506d590}} * away: SmashPig upgraded from {{Gerrit|fea03fcc}} to {{Gerrit|8cd1593b}} == 2026-02-12 == * 22:49 eileen: civicrm upgraded from {{Gerrit|c6c0d453}} to {{Gerrit|038f5bca}} * 21:45 jgleeson: payments-wiki upgraded from {{Gerrit|9dbf0ece}} to {{Gerrit|5793a405}} * 21:15 jgleeson: payments-wiki upgraded from {{Gerrit|6c1a522f}} to {{Gerrit|9dbf0ece}} * 19:03 larssandergreen: tools upgraded from {{Gerrit|645cf5dc}} to {{Gerrit|a3568ffc}} * 00:34 larssandergreen: civicrm upgraded from {{Gerrit|e13111f3}} to {{Gerrit|c6c0d453}} == 2026-02-11 == * 21:12 eileen: civicrm upgraded from {{Gerrit|6e57071a}} to {{Gerrit|e13111f3}} * 06:27 eileen: civicrm upgraded from {{Gerrit|98c325dd}} to {{Gerrit|6e57071a}} * 04:55 larssandergreen: tools upgraded from {{Gerrit|7462b8bd}} to {{Gerrit|645cf5dc}} * 03:37 eileen: civicrm upgraded from {{Gerrit|953cf9f2}} to {{Gerrit|98c325dd}} == 2026-02-09 == * 15:35 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|937d6e40}} to {{Gerrit|fea03fcc}} == 2026-02-08 == * 23:47 eileen: civicrm upgraded from {{Gerrit|bc3a8036}} to {{Gerrit|953cf9f2}} * 20:46 eileen: civicrm upgraded from {{Gerrit|40189afa}} to {{Gerrit|bc3a8036}} == 2026-02-06 == * 15:41 ejegg: fundraising civicrm upgraded from {{Gerrit|06842fbf}} to {{Gerrit|40189afa}} * 06:24 eileen: civicrm upgraded from {{Gerrit|b63d7146}} to {{Gerrit|06842fbf}} == 2026-02-05 == * 15:25 dwisehaupt: all eqiad hosts powered down and ready for relocation. * 15:03 dwisehaupt: starting poweroff of eqiad hosts * 14:48 dwisehaupt: downtimes scheduled for frack eqiad hosts and cross colo replication in prep for rack expansion - [[phab:T403035|T403035]] * 04:30 eileen: civicrm upgraded from {{Gerrit|000dd548}} to {{Gerrit|b63d7146}} == 2026-02-04 == * 23:56 eileen: civicrm upgraded from {{Gerrit|4c2870c5}} to {{Gerrit|000dd548}} * 23:47 cstone: donorwiki upgraded from {{Gerrit|53bfb05b}} to {{Gerrit|93e0d03f}} * 21:51 ejegg: payments-wiki upgraded from {{Gerrit|a09a4f8f}} to {{Gerrit|93e0d03f}} * 21:46 eileen: civicrm upgraded from {{Gerrit|dd10342d}} to {{Gerrit|4c2870c5}} * 21:22 eileen: civicrm upgraded from {{Gerrit|8aa9274a}} to {{Gerrit|dd10342d}} * 03:48 eileen: * civicrm upgraded from {{Gerrit|14c7b7e7}} to {{Gerrit|8aa9274a}} == 2026-02-03 == * 21:55 dwisehaupt: as part of dns cleanup, we have removed the old civi1002.wikimedia.org entry. folks should be using civicrm.wm.o but there is a chance of super old bookmarks still being around. we can reinstate if it becomes an issue. * 21:05 eileen: config revision changed from {{Gerrit|23b2d9b6}} to {{Gerrit|45d40cf1}} * 21:01 larssandergreen: civicrm upgraded from {{Gerrit|10ab3659}} to {{Gerrit|14c7b7e7}} * 05:49 eileen: config revision changed from {{Gerrit|f348441d}} to {{Gerrit|23b2d9b6}} * 05:48 eileen: civicrm upgraded from {{Gerrit|a097bb3d}} to {{Gerrit|10ab3659}} * 01:16 wfan: civicrm upgraded from {{Gerrit|5b9f3cd4}} to {{Gerrit|a097bb3d}} * 00:24 wfan: donorwiki upgraded from {{Gerrit|3ffc70f0}} to {{Gerrit|53bfb05b}} == 2026-02-02 == * 23:50 wfan: payments-wiki upgraded from {{Gerrit|c035aa84}} to {{Gerrit|53bfb05b}} * 20:22 cstone: civicrm upgraded from {{Gerrit|f91f955b}} to {{Gerrit|5b9f3cd4}} * 18:55 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|b42f2de4}} to {{Gerrit|937d6e40}} * 17:01 wfan: payments-wiki upgraded from {{Gerrit|c5cadd72}} to {{Gerrit|c035aa84}} * 15:26 ejegg: fundraising civicrm upgraded from {{Gerrit|611d18de}} to {{Gerrit|f91f955b}} == 2026-01-30 == * 17:02 larssandergreen: civicrm upgraded from {{Gerrit|79e4424e}} to {{Gerrit|611d18de}} * 04:18 cstone: civicrm upgraded from {{Gerrit|fe1af57a}} to {{Gerrit|79e4424e}} * 02:51 eileen: civicrm upgraded from {{Gerrit|ff772bee}} to {{Gerrit|fe1af57a}} * 02:28 eileen: civicrm upgraded from {{Gerrit|bcf976ae}} to {{Gerrit|ff772bee}} == 2026-01-29 == * 22:16 eileen: civicrm upgraded from {{Gerrit|5d121c63}} to {{Gerrit|bcf976ae}} * 20:17 eileen: civicrm upgraded from {{Gerrit|ebcfd009}} to {{Gerrit|5d121c63}} * 20:09 eileen: civicrm upgraded from {{Gerrit|5c065c4e}} to {{Gerrit|ebcfd009}} * 18:08 wfan: payments-wiki upgraded from {{Gerrit|81d9f614}} to {{Gerrit|c5cadd72}} == 2026-01-28 == * 19:36 jgleeson: SmashPig upgraded from {{Gerrit|96a6224d}} to {{Gerrit|b42f2de4}} * 18:53 jgleeson: civicrm upgraded from {{Gerrit|56c222da}} to {{Gerrit|5c065c4e}} * 18:53 jgleeson: SmashPig upgraded from {{Gerrit|96a6224d}} to {{Gerrit|b42f2de4}} * 14:41 jgleeson: payments-wiki upgraded from {{Gerrit|24915bdb}} to {{Gerrit|81d9f614}} * 07:30 eileen: civicrm upgraded from {{Gerrit|600b21a6}} to {{Gerrit|56c222da}} * 06:51 eileen: * civicrm upgraded from {{Gerrit|32f9a10d}} to {{Gerrit|600b21a6}} * 04:35 eileen: config revision changed from {{Gerrit|ed0808a9}} to {{Gerrit|ef6ef5f2}} * 03:42 eileen: civicrm upgraded from {{Gerrit|64267a34}} to {{Gerrit|32f9a10d}} * 02:13 eileen: civicrm upgraded from {{Gerrit|7299615a}} to {{Gerrit|64267a34}} == 2026-01-27 == * 21:12 larssandergreen: tools upgraded from {{Gerrit|84323460}} to {{Gerrit|7462b8bd}} * 05:57 cstone: civicrm upgraded from {{Gerrit|19f94835}} to {{Gerrit|75f443b5}} == 2026-01-26 == * 17:15 damilare: smashpig upgraded from {{Gerrit|8b4ebf34}} to {{Gerrit|96a6224d}} * 16:32 larssandergreen: tools upgraded from {{Gerrit|c75f7625}} to {{Gerrit|84323460}} * 01:43 eileen: config revision changed from {{Gerrit|2f71107f}} to {{Gerrit|ed0808a9}} switch to php dlocal downloader (now weekend is mostly over) - == 2026-01-25 == * 21:47 eileen: config revision changed from {{Gerrit|23023984}} to {{Gerrit|2f71107f}} == 2026-01-24 == * 02:19 cstone: civicrm upgraded from {{Gerrit|f7064a46}} to {{Gerrit|19f94835}} * 00:55 bd808: Testing #wikimedia-fundraising SAL integration ([[phab:T415389|T415389]]) <noinclude>[[Category:SAL]]</noinclude> de1bv6fnjxspcm79qm8loml3z9by7yn 2414290 2414275 2026-05-15T19:19:03Z Stashbot 7414 dwisehaupt: redis swap complete from frqueue1003 to frqueue1005 2414290 wikitext text/x-wiki == 2026-05-15 == * 19:19 dwisehaupt: redis swap complete from frqueue1003 to frqueue1005 * 15:59 ejegg: donorwiki upgraded from {{Gerrit|26f5451a}} to {{Gerrit|1a056dc0}} * 15:35 ejegg: payments-wiki upgraded from {{Gerrit|cf9ec80b}} to {{Gerrit|1a056dc0}} * 00:02 eileen: civicrm upgraded from {{Gerrit|6d8ce7a3}} to {{Gerrit|6a2258ff}} == 2026-05-14 == * 22:22 ejegg: fundraising scheduled jobs re-enabled * 22:12 eileen: cv upgraded from {{Gerrit|f19e0961}} to {{Gerrit|b8a8dd6a}} * 22:11 ejegg: fundraising civicrm upgraded from {{Gerrit|e25fa223}} to {{Gerrit|6d8ce7a3}} * 22:09 ejegg: fundraising scheduled jobs disabled for Civi update * 19:29 ejegg: re-enabled fundraising scheduled jobs * 19:06 ejegg: fundraising civicrm upgraded from {{Gerrit|950908ec}} to {{Gerrit|e25fa223}} * 19:04 ejegg: disabled fundraising scheduled jobs for CiviCRM deployment * 16:48 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|bb833986}} to {{Gerrit|a2a3a015}} == 2026-05-12 == * 16:07 ejegg: fundraising civicrm upgraded from {{Gerrit|24ac90e7}} to {{Gerrit|60dcc28f}} == 2026-05-06 == * 12:29 eileen: config revision changed from {{Gerrit|41cfd677}} to {{Gerrit|00752f91}} * 10:56 eileen: SmashPig upgraded from {{Gerrit|4201ef56}} to {{Gerrit|bb833986}} * 09:59 eileen: civicrm upgraded from {{Gerrit|4d9c8600}} to {{Gerrit|24ac90e7}} * 09:15 eileen: civicrm upgraded from {{Gerrit|38dcf7a8}} to {{Gerrit|4d9c8600}} == 2026-05-04 == * 17:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|1be60746}} to {{Gerrit|4201ef56}} == 2026-05-03 == * hackathon: civicrm upgraded from {{Gerrit|0afbc8ea}} to {{Gerrit|38dcf7a8}} == 2026-05-02 == * 09:44 eileen: civicrm upgraded from {{Gerrit|7556c5c7}} to {{Gerrit|0afbc8ea}} * 09:43 eileen: SmashPig upgraded from {{Gerrit|88a1bcba}} to {{Gerrit|1be60746}} == 2026-05-01 == * 16:36 eileen: civicrm upgraded from {{Gerrit|1a835879}} to {{Gerrit|7556c5c7}} * 15:08 eileen: civicrm upgraded from {{Gerrit|9ed32632}} to {{Gerrit|1a835879}} * 14:02 eileen: civicrm upgraded from {{Gerrit|081d5a29}} to {{Gerrit|9ed32632}} == 2026-04-29 == * 13:51 jgleeson: payments-wiki upgraded from {{Gerrit|2e2eb8a2}} to {{Gerrit|4e0c944b}} * 13:49 jgleeson: tools upgraded from {{Gerrit|f52a5dcf}} to {{Gerrit|afbd0f67}} == 2026-04-28 == * 20:58 larssandergreen: civicrm upgraded from {{Gerrit|be3bb76b}} to {{Gerrit|081d5a29}} == 2026-04-27 == * 19:04 ejegg: fundraising civicrm upgraded from {{Gerrit|3f8d49fa}} to {{Gerrit|be3bb76b}} * 19:02 ejegg: payments-wiki upgraded from {{Gerrit|b1a352af}} to {{Gerrit|5265089d}} * 18:58 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|572b69da}} to {{Gerrit|88a1bcba}} == 2026-04-23 == * 20:40 ejegg: payments-wiki upgraded from {{Gerrit|6e78ef91}} to {{Gerrit|b1a352af}} * 19:52 ejegg: civicrm fundraising upgraded from {{Gerrit|5ea4c8d3}} to {{Gerrit|3f8d49fa}} * 19:27 ejegg: SmashPig upgraded from {{Gerrit|f1b3f3d9}} to {{Gerrit|572b69da}} * 16:32 larssandergreen: civicrm upgraded from {{Gerrit|53a0b46f}} to {{Gerrit|5ea4c8d3}} * 16:31 larssandergreen: tools upgraded from {{Gerrit|edca3f63}} to {{Gerrit|f52a5dcf}} == 2026-04-22 == * 02:10 eileen: civicrm upgraded from {{Gerrit|abd23ad7}} to {{Gerrit|53a0b46f}} == 2026-04-21 == * 23:16 cstone: civicrm upgraded from {{Gerrit|22f24ae4}} to {{Gerrit|abd23ad7}} * 19:46 larssandergreen: civicrm upgraded from {{Gerrit|ddc1f044}} to {{Gerrit|22f24ae4}} == 2026-04-20 == * 19:21 larssandergreen: tools upgraded from {{Gerrit|26ab0125}} to {{Gerrit|edca3f63}} * 15:09 ejegg: payments-wiki upgraded from {{Gerrit|86a42498}} to {{Gerrit|6e78ef91}} == 2026-04-17 == * 01:08 larssandergreen: civicrm upgraded from {{Gerrit|90c0ccd9}} to {{Gerrit|ddc1f044}} == 2026-04-16 == * 16:39 larssandergreen: tools upgraded from {{Gerrit|f14a814e}} to {{Gerrit|26ab0125}} * 14:20 larssandergreen: civicrm upgraded from {{Gerrit|801847a7}} to {{Gerrit|90c0ccd9}} * 14:19 larssandergreen: tools upgraded from {{Gerrit|9bff5f07}} to {{Gerrit|f14a814e}} * 02:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|61fee241}} to {{Gerrit|f1b3f3d9}} == 2026-04-15 == * 21:50 eileen: civicrm upgraded from {{Gerrit|6f33b6d0}} to {{Gerrit|801847a7}} * 06:14 eileen: civicrm upgraded from {{Gerrit|a047bf92}} to {{Gerrit|6f33b6d0}} * 01:20 eileen: SmashPig upgraded from {{Gerrit|100101fb}} to {{Gerrit|61fee241}} == 2026-04-14 == * 23:58 eileen: civicrm upgraded from {{Gerrit|eb3d73e4}} to {{Gerrit|a047bf92}} * 23:48 wfan: payments-wiki upgraded from {{Gerrit|26f5451a}} to {{Gerrit|c3b34f99}} * 22:31 eileen: civicrm upgraded from {{Gerrit|2058927e}} to {{Gerrit|eb3d73e4}} * 22:25 eileen: civicrm upgraded from {{Gerrit|2058927e}} to {{Gerrit|eb3d73e4}} * 18:06 ejegg: fundraising civicrm upgraded from {{Gerrit|fccf9b3a}} to {{Gerrit|2058927e}} == 2026-04-13 == * 18:20 ejegg: fundraising civicrm upgraded from {{Gerrit|fa20eb0a}} to {{Gerrit|fccf9b3a}} * 17:04 ejegg: re-enabled recurring donation charge jobs * 16:29 ejegg: fundraising civicrm upgraded from {{Gerrit|eb188fa2}} to {{Gerrit|fa20eb0a}} * 16:27 ejegg: disabled recurring donation charge jobs for code / settings update * 12:44 jgleeson: donorwiki upgraded from {{Gerrit|064a770e}} to {{Gerrit|26f5451a}} == 2026-04-10 == * 15:35 jgleeson: payments-wiki upgraded from {{Gerrit|c017d7e7}} to {{Gerrit|dd45f867}} == 2026-04-09 == * 23:31 wfan: payments-wiki upgraded from {{Gerrit|064a770e}} to {{Gerrit|c017d7e7}} * 21:49 ejegg: fundraising civicrm upgraded from {{Gerrit|3d3c0a62}} to {{Gerrit|eb188fa2}} * 19:00 larssandergreen: tools upgraded from {{Gerrit|986f7f83}} to {{Gerrit|9bff5f07}} * 13:08 jgleeson: civicrm upgraded from {{Gerrit|d8d3871c}} to {{Gerrit|3d3c0a62}} * 11:53 jgleeson: SmashPig upgraded from {{Gerrit|5c083891}} to {{Gerrit|100101fb}} * 01:20 ejegg: fundraising civicrm upgraded from {{Gerrit|e60321bb}} to {{Gerrit|d8d3871c}} == 2026-04-08 == * 17:44 ejegg: fundraising civicrm upgraded from {{Gerrit|4ee0b5e8}} to {{Gerrit|e60321bb}} * 15:12 ejegg: payments-wiki upgraded from {{Gerrit|1ad85e6c}} to {{Gerrit|064a770e}} * 01:19 ejegg: donorwiki upgraded from {{Gerrit|1ad85e6c}} to {{Gerrit|064a770e}} * 00:26 dwisehaupt: cloning new frdb frdb1008 from frdb2005 == 2026-04-07 == * 18:34 wfan: civicrm upgraded from {{Gerrit|9104e70b}} to {{Gerrit|6f762e29}} == 2026-04-06 == * 20:42 ejegg: re-enabled recurring donation charge job * 20:33 wfan: donorwiki upgraded from {{Gerrit|c2d03117}} to {{Gerrit|1ad85e6c}} * 20:32 wfan: payments-wiki upgraded from {{Gerrit|80cda166}} to {{Gerrit|1ad85e6c}} * 16:53 ejegg: disabled recurring donations charge job while diagnosing gr4vy routing errors * 16:03 ejegg: civicrm upgraded from {{Gerrit|4ee11209}} to {{Gerrit|9104e70b}} == 2026-04-03 == * 00:04 wfan: civicrm upgraded from {{Gerrit|49f541cd}} to {{Gerrit|4ee11209}} == 2026-04-02 == * 21:38 cstone: payments-wiki upgraded from {{Gerrit|86bec442}} to {{Gerrit|80cda166}} * 05:24 eileen: civicrm upgraded from {{Gerrit|c512abc6}} to {{Gerrit|49f541cd}} * 02:39 eileen: civicrm upgraded from {{Gerrit|bbed1291}} to {{Gerrit|c512abc6}} * 02:16 eileen: SmashPig upgraded from {{Gerrit|9af71a7c}} to {{Gerrit|18ea746a}} == 2026-04-01 == * 18:57 eileen: civicrm upgraded from {{Gerrit|a1bf4768}} to {{Gerrit|bbed1291}} * 04:11 eileen: civicrm upgraded from {{Gerrit|11a2f9ab}} to {{Gerrit|a1bf4768}} * 03:18 ejegg: payments-wiki upgraded from {{Gerrit|02bf54b0}} to {{Gerrit|86bec442}} == 2026-03-31 == * 22:03 jgleeson: tools upgraded from {{Gerrit|9985e723}} to {{Gerrit|986f7f83}} * 20:16 eileen: civicrm upgraded from {{Gerrit|c3cc3562}} to {{Gerrit|b468301c}} * 18:40 jgleeson: tools upgraded from {{Gerrit|161049ac}} to {{Gerrit|9985e723}} * 17:39 ejegg: Standalone (IPN listener) SmashPig upgraded from {{Gerrit|abf8682a}} to {{Gerrit|9af71a7c}} * 16:38 jgleeson: tools upgraded from {{Gerrit|f605b570}} to {{Gerrit|161049ac}} * 16:24 jgleeson: donorwiki updated from {{Gerrit|d79a98b5}} to {{Gerrit|c2d03117}} * 02:46 eileen: civicrm upgraded from {{Gerrit|591bef29}} to {{Gerrit|c3cc3562}} * 01:00 eileen: civicrm upgraded from {{Gerrit|cf871dd3}} to {{Gerrit|591bef29}} == 2026-03-30 == * 23:19 eileen: civicrm upgraded from {{Gerrit|7d299b48}} to {{Gerrit|cf871dd3}} * 21:09 eileen: civicrm upgraded from {{Gerrit|3724cc2d}} to {{Gerrit|7d299b48}} * 20:40 eileen: civicrm upgraded from {{Gerrit|58426b1e}} to {{Gerrit|3724cc2d}} * 14:51 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|545f0b10}} to {{Gerrit|abf8682a}} == 2026-03-28 == * 02:57 ejegg: payments-wiki upgraded from {{Gerrit|d79a98b5}} to {{Gerrit|b239a6b7}} * 02:27 eileen: civicrm upgraded from {{Gerrit|7138d524}} to {{Gerrit|58426b1e}} == 2026-03-27 == * 22:37 eileen: civicrm upgraded from {{Gerrit|c51e98cc}} to {{Gerrit|7138d524}} * 17:50 dwisehaupt: Corrected value: on frdb1005 running the following in mysql to up the buffer pool to ~412G: set global innodb_buffer_pool_size = 431669379072; * 17:34 dwisehaupt: on frdb1005 running the following in mysql to up the buffer pool to ~412G: set global innodb_buffer_pool_size = 421552128; == 2026-03-26 == * 20:00 jgleeson: donorwiki upgraded from {{Gerrit|48cc2e9d}} to {{Gerrit|d79a98b5}} * 19:59 jgleeson: payments-wiki upgraded from {{Gerrit|b387c6ba}} to {{Gerrit|d79a98b5}} * 19:48 ejegg: fundraising civicrm upgraded from {{Gerrit|88b497b8}} to {{Gerrit|db102c77}} * 19:25 larssandergreen: civicrm upgraded from {{Gerrit|26dab2f0}} to {{Gerrit|88b497b8}} * 16:26 jgleeson: SmashPig upgraded from {{Gerrit|5d8a0330}} to {{Gerrit|545f0b10}} * 02:13 eileen: civicrm upgraded from {{Gerrit|a752f9e8}} to {{Gerrit|26dab2f0}} * 00:05 eileen: civicrm upgraded from {{Gerrit|277fe75e}} to {{Gerrit|a752f9e8}} == 2026-03-25 == * 21:53 eileen: civicrm upgraded from {{Gerrit|97319295}} to {{Gerrit|277fe75e}} * 19:58 eileen: civicrm upgraded from {{Gerrit|d3dfedb4}} to {{Gerrit|97319295}} == 2026-03-24 == * 19:32 wfan: payments-wiki upgraded from {{Gerrit|fce3fab5}} to {{Gerrit|b387c6ba}} * 04:24 eileen: civicrm upgraded from {{Gerrit|0aef661f}} to {{Gerrit|b2ed875e}} * 03:38 eileen: config revision changed from {{Gerrit|ded0c289}} to {{Gerrit|16592428}} schedule stripe download * 03:33 eileen: config revision changed from {{Gerrit|79e052e4}} to {{Gerrit|ded0c289}} temporarily disable adyen audit parse - let's fix those misplaced IDs * 01:55 eileen: civicrm upgraded from {{Gerrit|b2c7f1d0}} to {{Gerrit|0aef661f}} * 00:08 eileen: civicrm upgraded from {{Gerrit|80344f51}} to {{Gerrit|b2c7f1d0}} == 2026-03-23 == * 21:30 eileen: config revision changed from {{Gerrit|8c5587f3}} to {{Gerrit|2dd50e7c}} * 18:55 wfan: civicrm upgraded from {{Gerrit|675455b2}} to {{Gerrit|80344f51}} * 17:27 larssandergreen: tools upgraded from {{Gerrit|e60f63b3}} to {{Gerrit|f605b570}} * 15:59 ejegg: civicrm upgraded from {{Gerrit|a2d4b17c}} to {{Gerrit|675455b2}} * 12:25 jgleeson: payments-wiki upgraded from {{Gerrit|48cc2e9d}} to {{Gerrit|91d9eee9}} == 2026-03-20 == * 00:34 eileen: * civicrm upgraded from {{Gerrit|adc36173}} to {{Gerrit|a2d4b17c}} * 00:31 cstone: payments-wiki upgraded from {{Gerrit|f3420a6f}} to {{Gerrit|48cc2e9d}} * 00:27 eileen: config revision changed from {{Gerrit|a7486f6a}} to {{Gerrit|a1a426f3}} * 00:27 eileen: SmashPig upgraded from {{Gerrit|78a8e70a}} to {{Gerrit|5d8a0330}} == 2026-03-19 == * 14:08 damilare: civiproxy upgraded from {{Gerrit|6625c844}} to {{Gerrit|38ba8348}} == 2026-03-17 == * 18:40 jgleeson: donorwiki upgraded from {{Gerrit|4c09db39}} to {{Gerrit|7d1666f9}} * 06:20 eileen: civicrm upgraded from {{Gerrit|7fe14629}} to {{Gerrit|adc36173}} * 05:06 eileen: civicrm upgraded from {{Gerrit|e622a222}} to {{Gerrit|7fe14629}} * 03:48 eileen: civicrm upgraded from {{Gerrit|5360f9ad}} to {{Gerrit|e622a222}} * 02:13 eileen: civicrm upgraded from {{Gerrit|3283e3ca}} to {{Gerrit|e73c6b50}} == 2026-03-15 == * 23:56 eileen: civicrm upgraded from {{Gerrit|dce257f0}} to {{Gerrit|3283e3ca}} * 19:43 eileen: civicrm upgraded from {{Gerrit|a1279ee4}} to {{Gerrit|dce257f0}} == 2026-03-13 == * ish: payments-wiki upgraded from {{Gerrit|f40a1153}} to {{Gerrit|f3420a6f}} == 2026-03-11 == * 21:20 larssandergreen: civicrm upgraded from {{Gerrit|c2c716ca}} to {{Gerrit|a1279ee4}} * 19:15 eileen: civicrm upgraded from {{Gerrit|81baf495}} to {{Gerrit|c2c716ca}} * 07:18 eileen: config revision changed from {{Gerrit|ed2295ab}} to {{Gerrit|a7486f6a}} * 07:02 eileen: civicrm upgraded from {{Gerrit|f418297f}} to {{Gerrit|81baf495}} * 02:28 eileen: civicrm upgraded from {{Gerrit|14e8200e}} to {{Gerrit|da26f37d}} * 00:06 eileen: civicrm upgraded from {{Gerrit|fbb38eda}} to {{Gerrit|14e8200e}} == 2026-03-10 == * 22:08 eileen: civicrm upgraded from {{Gerrit|ef319ea3}} to {{Gerrit|fbb38eda}} * 19:35 eileen: civicrm upgraded from {{Gerrit|773d9fb9}} to {{Gerrit|ef319ea3}} * 06:06 eileen: config revision changed from {{Gerrit|b9bc2a20}} to {{Gerrit|60ef6709}} == 2026-03-09 == * 17:37 damilare: payments-wiki upgraded from {{Gerrit|5b747b97}} to {{Gerrit|f40a1153}} == 2026-03-06 == * 23:39 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|ed16a2ea}} to {{Gerrit|78a8e70a}} * 21:48 ejegg: fundraising civicrm upgraded from {{Gerrit|8aadcd81}} to {{Gerrit|773d9fb9}} * 21:13 ejegg: civicrm fundraising upgraded from {{Gerrit|a1f32ed6}} to {{Gerrit|8aadcd81}} * 20:35 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|217fc7fc}} to {{Gerrit|ed16a2ea}} * 19:18 ejegg: civicrm upgraded from {{Gerrit|f2633c89}} to {{Gerrit|a1f32ed6}} * 03:08 larssandergreen: civicrm upgraded from {{Gerrit|fbac3ce7}} to {{Gerrit|2bae36fa}} * 02:09 ejegg: payments-wiki upgraded from {{Gerrit|9ae5bf60}} to {{Gerrit|5b747b97}} == 2026-03-05 == * 17:53 ejegg: donorwiki upgraded from {{Gerrit|7329b41d}} to {{Gerrit|4c09db39}} * 05:59 eileen: ivicrm upgraded from {{Gerrit|11e5a5d8}} to {{Gerrit|fbac3ce7}} * 05:08 eileen: * civicrm upgraded from {{Gerrit|8bdce85f}} to {{Gerrit|11e5a5d8}} == 2026-03-04 == * 18:52 jgleeson: tools upgraded from {{Gerrit|a3568ffc}} to {{Gerrit|e60f63b3}} * 03:21 ejegg: payments-wiki upgraded from {{Gerrit|5e4939a3}} to {{Gerrit|9ae5bf60}} * 03:20 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|78960c68}} to {{Gerrit|217fc7fc}} == 2026-03-03 == * 20:42 eileen: civicrm upgraded from {{Gerrit|b610f844}} to {{Gerrit|8bdce85f}} * 17:09 dwisehaupt: latest php8.2 updates installed on civi1002 * 06:00 eileen: * civicrm upgraded from {{Gerrit|f4a70c82}} to {{Gerrit|b610f844}} == 2026-03-02 == * 20:40 eileen: cv upgraded from {{Gerrit|dfeedcbe}} to {{Gerrit|f19e0961}} * 18:41 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|8cd1593b}} to {{Gerrit|78960c68}} * 15:13 ejegg: donorwiki upgraded from {{Gerrit|f5d7179a}} to {{Gerrit|7329b41d}} == 2026-02-27 == * 19:00 ejegg: fundraising civicrm upgraded from {{Gerrit|162bbf7c}} to {{Gerrit|f4a70c82}} * 12:24 jgleeson: payments-wiki upgraded from {{Gerrit|f4fd71ff}} to {{Gerrit|a8a9ce78}} * 11:25 jgleeson: payments-wiki upgraded from {{Gerrit|974af222}} to {{Gerrit|f4fd71ff}} == 2026-02-26 == * 22:36 eileen: civicrm upgraded from {{Gerrit|6995dc03}} to {{Gerrit|162bbf7c}} * 21:52 ejegg: civicrm upgraded from {{Gerrit|2a6001d3}} to {{Gerrit|6995dc03}} * 17:44 larssandergreen: civicrm upgraded from {{Gerrit|3ac93e95}} to {{Gerrit|2a6001d3}} * 07:14 eileen: civicrm upgraded from {{Gerrit|c3157fbf}} to {{Gerrit|3ac93e95}} == 2026-02-25 == * 18:33 ejegg: fundraising civicrm upgraded from {{Gerrit|f881e026}} to {{Gerrit|c3157fbf}} * 03:37 eileen: config revision changed from {{Gerrit|390d6434}} to {{Gerrit|a0228e6c}} turn off trustly audit == 2026-02-24 == * 22:48 ejegg: payments-wiki upgraded from {{Gerrit|e5f73610}} to {{Gerrit|974af222}} * 19:47 ejegg: payments-wiki upgraded from {{Gerrit|f5d7179a}} to {{Gerrit|e5f73610}} * 02:06 eileen: config revision changed from {{Gerrit|71c98072}} to {{Gerrit|390d6434}} reenabled trustly audit == 2026-02-23 == * 23:42 eileen: civicrm upgraded from {{Gerrit|f0710864}} to {{Gerrit|f881e026}} * 22:28 ejegg: fundraising civicrm upgraded from {{Gerrit|9d58ce4a}} to {{Gerrit|f0710864}} * 16:43 jgleeson: payments-wiki upgraded from {{Gerrit|0127f2d8}} to {{Gerrit|f5d7179a}} == 2026-02-21 == * 01:15 dwisehaupt: updating localsettings from {{Gerrit|71c98072}} to {{Gerrit|534fbf34}} and syncing civicrm to push update for large_donation_notifications == 2026-02-20 == * 20:38 ejegg: donorwiki upgraded from {{Gerrit|f7a0ee6b}} to {{Gerrit|f5d7179a}} == 2026-02-19 == * 21:26 ejegg: payments-wiki upgraded from {{Gerrit|f7a0ee6b}} to {{Gerrit|0127f2d8}} * 06:37 eileen: civicrm upgraded from {{Gerrit|ac30e19f}} to {{Gerrit|9d58ce4a}} == 2026-02-18 == * 22:21 dwisehaupt: disabling apache2::mod::dump_io on civicrm role for debugging 500 errors after testing. can be re-enabled by reverting commit {{Gerrit|4b1d94399}} - [[phab:T417310|T417310]] * 20:38 eileen: civicrm upgraded from {{Gerrit|f5020a85}} to {{Gerrit|ac30e19f}} * 19:22 eileen: civicrm upgraded from {{Gerrit|66d2e1dd}} to {{Gerrit|f5020a85}} * 17:48 ejegg: donorwiki upgraded from {{Gerrit|488431ec}} to {{Gerrit|f7a0ee6b}} * 16:51 dwisehaupt: enabling apache2::mod::dump_io on civicrm role for debugging 500 errors - [[phab:T417310|T417310]] * 06:02 eileen: civicrm upgraded from {{Gerrit|4c3cdcde}} to {{Gerrit|66d2e1dd}} * 05:33 eileen: config revision changed from {{Gerrit|605c6946}} to {{Gerrit|368156fa}} * 05:21 eileen: civicrm upgraded from {{Gerrit|caad5ab9}} to {{Gerrit|4c3cdcde}} == 2026-02-17 == * 17:34 larssandergreen: payments-wiki upgraded from {{Gerrit|c506d590}} to {{Gerrit|488431ec}} * 17:33 larssandergreen: donorwiki upgraded from {{Gerrit|93e0d03f}} to {{Gerrit|488431ec}} * 03:02 eileen: civicrm upgraded from {{Gerrit|89782fc6}} to {{Gerrit|caad5ab9}} == 2026-02-16 == * 20:16 eileen: civicrm upgraded from {{Gerrit|2b227403}} to {{Gerrit|89782fc6}} == 2026-02-15 == * 23:55 eileen: civicrm upgraded from {{Gerrit|de8252c7}} to {{Gerrit|2b227403}} * 21:07 eileen: civicrm upgraded from {{Gerrit|038f5bca}} to {{Gerrit|de8252c7}} == 2026-02-13 == * 16:03 ejegg: payments-wiki upgraded from {{Gerrit|5793a405}} to {{Gerrit|c506d590}} * away: SmashPig upgraded from {{Gerrit|fea03fcc}} to {{Gerrit|8cd1593b}} == 2026-02-12 == * 22:49 eileen: civicrm upgraded from {{Gerrit|c6c0d453}} to {{Gerrit|038f5bca}} * 21:45 jgleeson: payments-wiki upgraded from {{Gerrit|9dbf0ece}} to {{Gerrit|5793a405}} * 21:15 jgleeson: payments-wiki upgraded from {{Gerrit|6c1a522f}} to {{Gerrit|9dbf0ece}} * 19:03 larssandergreen: tools upgraded from {{Gerrit|645cf5dc}} to {{Gerrit|a3568ffc}} * 00:34 larssandergreen: civicrm upgraded from {{Gerrit|e13111f3}} to {{Gerrit|c6c0d453}} == 2026-02-11 == * 21:12 eileen: civicrm upgraded from {{Gerrit|6e57071a}} to {{Gerrit|e13111f3}} * 06:27 eileen: civicrm upgraded from {{Gerrit|98c325dd}} to {{Gerrit|6e57071a}} * 04:55 larssandergreen: tools upgraded from {{Gerrit|7462b8bd}} to {{Gerrit|645cf5dc}} * 03:37 eileen: civicrm upgraded from {{Gerrit|953cf9f2}} to {{Gerrit|98c325dd}} == 2026-02-09 == * 15:35 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|937d6e40}} to {{Gerrit|fea03fcc}} == 2026-02-08 == * 23:47 eileen: civicrm upgraded from {{Gerrit|bc3a8036}} to {{Gerrit|953cf9f2}} * 20:46 eileen: civicrm upgraded from {{Gerrit|40189afa}} to {{Gerrit|bc3a8036}} == 2026-02-06 == * 15:41 ejegg: fundraising civicrm upgraded from {{Gerrit|06842fbf}} to {{Gerrit|40189afa}} * 06:24 eileen: civicrm upgraded from {{Gerrit|b63d7146}} to {{Gerrit|06842fbf}} == 2026-02-05 == * 15:25 dwisehaupt: all eqiad hosts powered down and ready for relocation. * 15:03 dwisehaupt: starting poweroff of eqiad hosts * 14:48 dwisehaupt: downtimes scheduled for frack eqiad hosts and cross colo replication in prep for rack expansion - [[phab:T403035|T403035]] * 04:30 eileen: civicrm upgraded from {{Gerrit|000dd548}} to {{Gerrit|b63d7146}} == 2026-02-04 == * 23:56 eileen: civicrm upgraded from {{Gerrit|4c2870c5}} to {{Gerrit|000dd548}} * 23:47 cstone: donorwiki upgraded from {{Gerrit|53bfb05b}} to {{Gerrit|93e0d03f}} * 21:51 ejegg: payments-wiki upgraded from {{Gerrit|a09a4f8f}} to {{Gerrit|93e0d03f}} * 21:46 eileen: civicrm upgraded from {{Gerrit|dd10342d}} to {{Gerrit|4c2870c5}} * 21:22 eileen: civicrm upgraded from {{Gerrit|8aa9274a}} to {{Gerrit|dd10342d}} * 03:48 eileen: * civicrm upgraded from {{Gerrit|14c7b7e7}} to {{Gerrit|8aa9274a}} == 2026-02-03 == * 21:55 dwisehaupt: as part of dns cleanup, we have removed the old civi1002.wikimedia.org entry. folks should be using civicrm.wm.o but there is a chance of super old bookmarks still being around. we can reinstate if it becomes an issue. * 21:05 eileen: config revision changed from {{Gerrit|23b2d9b6}} to {{Gerrit|45d40cf1}} * 21:01 larssandergreen: civicrm upgraded from {{Gerrit|10ab3659}} to {{Gerrit|14c7b7e7}} * 05:49 eileen: config revision changed from {{Gerrit|f348441d}} to {{Gerrit|23b2d9b6}} * 05:48 eileen: civicrm upgraded from {{Gerrit|a097bb3d}} to {{Gerrit|10ab3659}} * 01:16 wfan: civicrm upgraded from {{Gerrit|5b9f3cd4}} to {{Gerrit|a097bb3d}} * 00:24 wfan: donorwiki upgraded from {{Gerrit|3ffc70f0}} to {{Gerrit|53bfb05b}} == 2026-02-02 == * 23:50 wfan: payments-wiki upgraded from {{Gerrit|c035aa84}} to {{Gerrit|53bfb05b}} * 20:22 cstone: civicrm upgraded from {{Gerrit|f91f955b}} to {{Gerrit|5b9f3cd4}} * 18:55 ejegg: standalone (IPN listener) SmashPig upgraded from {{Gerrit|b42f2de4}} to {{Gerrit|937d6e40}} * 17:01 wfan: payments-wiki upgraded from {{Gerrit|c5cadd72}} to {{Gerrit|c035aa84}} * 15:26 ejegg: fundraising civicrm upgraded from {{Gerrit|611d18de}} to {{Gerrit|f91f955b}} == 2026-01-30 == * 17:02 larssandergreen: civicrm upgraded from {{Gerrit|79e4424e}} to {{Gerrit|611d18de}} * 04:18 cstone: civicrm upgraded from {{Gerrit|fe1af57a}} to {{Gerrit|79e4424e}} * 02:51 eileen: civicrm upgraded from {{Gerrit|ff772bee}} to {{Gerrit|fe1af57a}} * 02:28 eileen: civicrm upgraded from {{Gerrit|bcf976ae}} to {{Gerrit|ff772bee}} == 2026-01-29 == * 22:16 eileen: civicrm upgraded from {{Gerrit|5d121c63}} to {{Gerrit|bcf976ae}} * 20:17 eileen: civicrm upgraded from {{Gerrit|ebcfd009}} to {{Gerrit|5d121c63}} * 20:09 eileen: civicrm upgraded from {{Gerrit|5c065c4e}} to {{Gerrit|ebcfd009}} * 18:08 wfan: payments-wiki upgraded from {{Gerrit|81d9f614}} to {{Gerrit|c5cadd72}} == 2026-01-28 == * 19:36 jgleeson: SmashPig upgraded from {{Gerrit|96a6224d}} to {{Gerrit|b42f2de4}} * 18:53 jgleeson: civicrm upgraded from {{Gerrit|56c222da}} to {{Gerrit|5c065c4e}} * 18:53 jgleeson: SmashPig upgraded from {{Gerrit|96a6224d}} to {{Gerrit|b42f2de4}} * 14:41 jgleeson: payments-wiki upgraded from {{Gerrit|24915bdb}} to {{Gerrit|81d9f614}} * 07:30 eileen: civicrm upgraded from {{Gerrit|600b21a6}} to {{Gerrit|56c222da}} * 06:51 eileen: * civicrm upgraded from {{Gerrit|32f9a10d}} to {{Gerrit|600b21a6}} * 04:35 eileen: config revision changed from {{Gerrit|ed0808a9}} to {{Gerrit|ef6ef5f2}} * 03:42 eileen: civicrm upgraded from {{Gerrit|64267a34}} to {{Gerrit|32f9a10d}} * 02:13 eileen: civicrm upgraded from {{Gerrit|7299615a}} to {{Gerrit|64267a34}} == 2026-01-27 == * 21:12 larssandergreen: tools upgraded from {{Gerrit|84323460}} to {{Gerrit|7462b8bd}} * 05:57 cstone: civicrm upgraded from {{Gerrit|19f94835}} to {{Gerrit|75f443b5}} == 2026-01-26 == * 17:15 damilare: smashpig upgraded from {{Gerrit|8b4ebf34}} to {{Gerrit|96a6224d}} * 16:32 larssandergreen: tools upgraded from {{Gerrit|c75f7625}} to {{Gerrit|84323460}} * 01:43 eileen: config revision changed from {{Gerrit|2f71107f}} to {{Gerrit|ed0808a9}} switch to php dlocal downloader (now weekend is mostly over) - == 2026-01-25 == * 21:47 eileen: config revision changed from {{Gerrit|23023984}} to {{Gerrit|2f71107f}} == 2026-01-24 == * 02:19 cstone: civicrm upgraded from {{Gerrit|f7064a46}} to {{Gerrit|19f94835}} * 00:55 bd808: Testing #wikimedia-fundraising SAL integration ([[phab:T415389|T415389]]) <noinclude>[[Category:SAL]]</noinclude> bsjczc0jcj9u0t2g0k5j3k1gaoc8cz8 Data Platform Engineering/Automated Traffic Incident Runbook 0 459874 2414258 2387393 2026-05-15T13:47:27Z GGoncalves-WMF 42848 Revise as we close the Nov 2025 incident. 2414258 wikitext text/x-wiki This page is a guide for Data Engineering folks who are dealing with a data issue potentially related to misclassified automated traffic. Read this if... * ...you receive an alert from Airflow's <code>pageview_human_bot_daily</code> DAG, telling you that the proportion of human traffic has suddenly increased. * ...you are notified by an analyst or a Community member that the pageviews and unique devices metrics for a given set of wikis are seeing a suspicious spike since 3 weeks ago. === Tracking === Before continuing, '''file an [https://phabricator.wikimedia.org/maniphest/task/edit/form/23/?title=Automated%20traffic%20incident%3A%20%3Cshort%20impact%20summary%3E&tags=Data-Engineering,Movement-Insights,WMF-NDA&subscribers=ahoelzl,GGoncalves-WMF,KZimmerman,JerryWang-WMF,Osefu-WMF&description=This%20task%20tracks%20the%20joint%20investigation%20between%20DPE%20and%20RDS%20on%20a%20report%20of%20suspected%20bot%20traffic.%0A%0A&#x5B;&#x5D;%20%40Ahoelzl%20to%20appoint%20a%20Data%20Engineering%20point%20of%20contact%20(%22DE%22).%0A&#x5B;&#x5D;%20&#x5B;DE&#x5D;%20Provide%20an%20initial%20impact%20assessment%3A%20time%20window%20for%20investigation%2C%20%2F%2Frough%2F%2F%20impact%20(e.g.%20approximately%20%2B200%25%20human%20pageviews)%2C%20impacted%20projects.%0A&#x5B;&#x5D;%20&#x5B;DE&#x5D;%20&#x5B;Run%20basic%20checks&#x5D;(https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform_Engineering%2FAutomated_Traffic_Incident_Runbook%23Initial_checks)%20to%20rule%20out%20infrastructure%20or%20data%20processing%20issues.%0A&#x5B;&#x5D;%20%40OSefu-WMF%20to%20confirm%20whether%20to%20continue%20a%20response%2C%20and%20appoint%20an%20RDS%20point%20of%20contact%20(%22RDS%22).%0A&#x5B;&#x5D;%20&#x5B;RDS&#x5D;%20File%20an%20FYI%20L3SC%20ticket%20to%20Legal%20about%20extending%20webrequest%20data%20retention%20(see%20&#x5B;previous%20approval&#x5D;(https%3A%2F%2Fapp.asana.com%2F1%2F3758245663860%2Ftask%2F1213320344653123%2Fcomment%2F1213427400582171%3Ffocus%3Dtrue)).%0A&#x5B;&#x5D;%20&#x5B;DE&#x5D;%20Pause%20deletion%20of%20data%20older%20than%2090%20days%20from%20`webrequest`.%0A&#x5B;&#x5D;%20&#x5B;RDS&#x5D;%20&#x5B;&#x5B;%20https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform%2FSystems%2FDashiki%2FConfiguration%20|%20Annotate%20Wikistats%20&#x5D;&#x5D;%20and%20the%20&#x5B;&#x5B;%20https%3A%2F%2Fsuperset.wikimedia.org%2Fsuperset%2Fdashboard%2F723%2F%3Fnative_filters_key%3DwAJ12f6PFeGWnvW0alXtqoHQ8AW2tKKBX3uPZTBCtEJK2xjFLTluXAS-LXa7Axrb%20|%20Movement%20Trends%20dashboard%20&#x5D;&#x5D;%20to%20flag%20the%20in-progress%20investigation.%0A&#x5B;&#x5D;%20&#x5B;RDS&#x5D;%20Characterize%20the%20suspected%20bot%20traffic%3A%20what%20fingerprints%20or%20signals%20correlate%20to%20the%20apparent%20automated%20traffic%3F%0A&#x5B;&#x5D;%20&#x5B;RDS&#x5D;%20Provide%20an%20updated%20impact%20assessment%20for%20the%20chosen%20fingerprint(s)%3A%20what%20classification%20changes%20should%20we%20introduce%20for%20them%3F%0A&#x5B;&#x5D;%20&#x5B;RDS&#x5D;%20Optionally%2C%20request%20a%20backfill%20to%20be%20done%20over%20the%20course%20of%206w.%0A&#x5B;&#x5D;%20&#x5B;DE&#x5D;%20Implement%20updates%20to%20bot%20detection%20model%20and%20enable%20them%20in%20production%2C%20with%20optional%20backfill.%0A&#x5B;&#x5D;%20&#x5B;RDS&#x5D;%20Update%20Wikistats%20and%20Movement%20Trends%20annotations%20to%20note%20the%20final%20impact%20and%20correction%2C%20if%20applicable.%0A&#x5B;&#x5D;%20&#x5B;DE&#x5D;%20Re-enable%20`webrequest`%2090-day%20expiration.%0A&#x5B;&#x5D;%20&#x5B;DE&#x5D;%20Write%20an%20incident%20report%20and%20file%20tickets%20for%20follow-up%20actions. automated traffic data incident ticket]''' to track the response to this incident. === Responsibilities === Data Engineering is responsible for: * Detecting the incident, ideally via automated alerting. * Providing an initial impact assessment. * Ruling out infrastructure issues as the cause. * Pausing DAGs to allow investigation and backfill. * Mitigating the incident by implementing changes specified by RDS after investigation. * Performing a backfill, if requested by RDS. Our partners at RDS are responsible for: * Investigating anomalous traffic to characterize it for mitigation. * Communicating the impact of the incident to internal and external audiences, as applicable. Use the incident tracking ticket you filed above to coordinate on those actions. === Ruling out infrastructure issues === We're trying to discard the possibility that the suspicious behavior is caused by infrastructure or data processing issues. Check Airflow history for the DAGs that generate the suspicious data. For instance, if the affected data is pageviews and unique devices, you can check: * refine_webrequest_hourly_text * webrequest_actor_metrics_hourly * webrequest_actor_metrics_rollup_hourly * webrequest_actor_label_hourly * pageview_actor_hourly You can look for frequent job failures, sensor timeouts, big variations in the task processing times. If Airflow history is within normality, we can discard an infrastructure or processing issue. === Initial impact statement === Update the tracking ticket with an initial time range, rough estimated impact to metrics and affected projects. You can do this by opening Turnilo's pageviews_daily datasource, and making sure that the suspicious spike follows some of these bot patterns: * Some wikis are affected, others aren't. * Some Countries are affected, some aren't. * Some UserAgents are affected, others aren't. * Both human and bot metrics either stay the same or go up. Usually, when there's a bot traffic spike, part of it is properly classified by our system, which makes the bot metric go up; but also part of the traffic is not properly classified, which makes the human metric go up. If human pageviews go up, but bot pageviews go down (or vice-versa), it could mean a classification issue, or a data issue, not an automated traffic increase. Look at time range(s), affected wikis, countries issuing the requests, and try to get a rough estimate of the magnitude of the increase in traffic. === Mitigating === RDS will be included in the tracking ticket through the above template, and will assign an analyst to help us find a pattern in the misclassified bot traffic so it can be isolated. If we find a pattern that we can use to isolate the misclassified bot traffic, we should file a task to implement the pattern into our automated traffic detection pipeline, and test the results alongside the analyst point of contact. In order to simplify mitigation, the most likely outcome of this step will be to make updates recommended by the analyst to our list of known-bot fingerprints. === Backfilling === After applying bot detection changes, RDS may request that we backfill the data. We have an agreed budget of '''6 weeks''' to complete this operation, to control the disruption it may cause to other Data Engineering priorities. # '''File a ticket to track the backfill itself''' ([[phab:T421735|example]]). # '''Identify which DAGs need to be rerun.''' Consider DAG dependencies: usually, if we rerun a DAG, we also have to rerun all of its dependencies, recursively. # '''Make sure all affected DAGs use ExternalTaskSensors to sense for their parents'''. When rerunning a DAG with ExternalTaskSensors, it waits for their parents to have finished before starting any processing that could result in corrupted data. This does not work well with HivePartitionSensors or other data sensors. Modify the affected DAGs if necessary. # '''Identify the exact date ranges each DAG needs to be rerun for.''' Note DAGs can be hourly, daily, weekly and monthly, and they will probably need different rerun start/stop times. Some DAGs are windowed, i.e. look 24h backwards, calculate parent DAGs rerun dates accordingly. # '''Consider making a backup copy of the original data at the top of the pipeline tree.''' For instance, if the affected metrics are pageviews and unique devices, you could backup webrequest data. This way, if the rerun fails for any reason, we would be able to recover the previous state of the data. # '''Use the [[gitlab:-/snippets/260|rerun script]]''' (or improve it to better fit your needs!) to clear the affected Airflow dag_runs robustly and at a pace sustainable for the cluster. The order in which you specify the DAGs to rerun is important. The script only reruns DAGs for 1 granularity at a time. You should rerun hourly DAGs first, then daily, then weekly and then monthly. Make sure you only start rerunning the monthly ones when the corresponding source data for the whole month has been previously rerun. You can run the script from the Kubernetes Airflow command line. # '''Vet the data as the reruns are happening'''. # '''Keep WMF analysts informed'''. Leave weekly updates to <code>#working-with-data</code> on Slack (RDS is responsible for communicating to other groups.) 9bfrlr2h4cfnkkrl4ja8k6um1a0eoam 2414259 2414258 2026-05-15T13:49:14Z GGoncalves-WMF 42848 Update template to tracker ticket. 2414259 wikitext text/x-wiki This page is a guide for Data Engineering folks who are dealing with a data issue potentially related to misclassified automated traffic. Read this if... * ...you receive an alert from Airflow's <code>pageview_human_bot_daily</code> DAG, telling you that the proportion of human traffic has suddenly increased. * ...you are notified by an analyst or a Community member that the pageviews and unique devices metrics for a given set of wikis are seeing a suspicious spike since 3 weeks ago. === Tracking === Before continuing, '''file an [https://phabricator.wikimedia.org/maniphest/task/edit/form/23/?title=Automated%20traffic%20incident%3A%20%3Cshort%20impact%20summary%3E&tags=Data-Engineering,Movement-Insights,WMF-NDA&subscribers=ahoelzl,GGoncalves-WMF,KZimmerman,JerryWang-WMF,Osefu-WMF&description=This%20task%20tracks%20the%20joint%20investigation%20between%20DPE%20and%20RDS%20on%20a%20report%20of%20suspected%20bot%20traffic.%0A%0A%23%23%20Response%0A%0A%5B%5D%20%40Ahoelzl%20to%20appoint%20a%20Data%20Engineering%20point%20of%20contact%20%28%22DE%22%29.%0A%5B%5D%20%5BDE%5D%20%5BRule%20out%20infrastructure%20issues%5D%28https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform_Engineering%2FAutomated_Traffic_Incident_Runbook%23Ruling_out_infrastructure_issues%29%20as%20the%20cause.%0A%5B%5D%20%5BDE%5D%20Provide%20an%20%5Binitial%20impact%20assessment%5D%28https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform_Engineering%2FAutomated_Traffic_Incident_Runbook%23Initial_impact_statement%29%3A%20time%20window%20for%20investigation%2C%20%2F%2Frough%2F%2F%20impact%20%28e.g.%20approximately%20%2B200%25%20human%20pageviews%29%2C%20impacted%20projects.%0A%5B%5D%20%40OSefu-WMF%20to%20confirm%20whether%20to%20continue%20a%20response%2C%20and%20appoint%20an%20RDS%20point%20of%20contact%20%28%22RDS%22%29.%0A%5B%5D%20%5BRDS%5D%20File%20an%20FYI%20L3SC%20ticket%20to%20Legal%20about%20extending%20webrequest%20data%20retention%20%28see%20%5Bprevious%20approval%5D%28https%3A%2F%2Fapp.asana.com%2F1%2F3758245663860%2Ftask%2F1213320344653123%2Fcomment%2F1213427400582171%3Ffocus%3Dtrue%29%29.%0A%5B%5D%20%5BDE%5D%20Pause%20deletion%20of%20data%20older%20than%2090%20days%20from%20%60webrequest%60.%0A%5B%5D%20%5BRDS%5D%20%5B%5B%20https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform%2FSystems%2FDashiki%2FConfiguration%20%7C%20Annotate%20Wikistats%20%5D%5D%20and%20the%20%5B%5B%20https%3A%2F%2Fsuperset.wikimedia.org%2Fsuperset%2Fdashboard%2F723%2F%3Fnative_filters_key%3DwAJ12f6PFeGWnvW0alXtqoHQ8AW2tKKBX3uPZTBCtEJK2xjFLTluXAS-LXa7Axrb%20%7C%20Movement%20Trends%20dashboard%20%5D%5D%20to%20flag%20the%20in-progress%20investigation.%0A%5B%5D%20%5BRDS%5D%20Characterize%20the%20suspected%20bot%20traffic%3A%20what%20fingerprints%20or%20signals%20correlate%20to%20the%20apparent%20automated%20traffic%3F%0A%5B%5D%20%5BRDS%5D%20Provide%20an%20updated%20impact%20assessment%20for%20the%20chosen%20fingerprint%28s%29%3A%20what%20classification%20changes%20should%20we%20introduce%20for%20them%3F%0A%5B%5D%20%5BDE%5D%20If%20requested%20by%20RDS%2C%20do%20a%20%5Bbackfill%5D%28https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform_Engineering%2FAutomated_Traffic_Incident_Runbook%23Backfilling%29%20for%20the%20relevant%20period%20to%20correct%20metrics.%0A%5B%5D%20%5BDE%5D%20Implement%20updates%20to%20bot%20detection%20model%20and%20enable%20them%20in%20production%2C%20with%20optional%20backfill.%0A%5B%5D%20%5BRDS%5D%20Update%20Wikistats%20and%20Movement%20Trends%20annotations%20to%20note%20the%20final%20impact%20and%20correction%2C%20if%20applicable.%0A%5B%5D%20%5BDE%5D%20Re-enable%20%60webrequest%60%2090-day%20expiration.%0A%5B%5D%20%5BDE%5D%20Write%20an%20incident%20report%20and%20file%20tickets%20for%20follow-up%20actions. automated traffic data incident ticket]''' to track the response to this incident. === Responsibilities === Data Engineering is responsible for: * Detecting the incident, ideally via automated alerting. * Providing an initial impact assessment. * Ruling out infrastructure issues as the cause. * Pausing DAGs to allow investigation and backfill. * Mitigating the incident by implementing changes specified by RDS after investigation. * Performing a backfill, if requested by RDS. Our partners at RDS are responsible for: * Investigating anomalous traffic to characterize it for mitigation. * Communicating the impact of the incident to internal and external audiences, as applicable. Use the incident tracking ticket you filed above to coordinate on those actions. === Ruling out infrastructure issues === We're trying to discard the possibility that the suspicious behavior is caused by infrastructure or data processing issues. Check Airflow history for the DAGs that generate the suspicious data. For instance, if the affected data is pageviews and unique devices, you can check: * refine_webrequest_hourly_text * webrequest_actor_metrics_hourly * webrequest_actor_metrics_rollup_hourly * webrequest_actor_label_hourly * pageview_actor_hourly You can look for frequent job failures, sensor timeouts, big variations in the task processing times. If Airflow history is within normality, we can discard an infrastructure or processing issue. === Initial impact statement === Update the tracking ticket with an initial time range, rough estimated impact to metrics and affected projects. You can do this by opening Turnilo's pageviews_daily datasource, and making sure that the suspicious spike follows some of these bot patterns: * Some wikis are affected, others aren't. * Some Countries are affected, some aren't. * Some UserAgents are affected, others aren't. * Both human and bot metrics either stay the same or go up. Usually, when there's a bot traffic spike, part of it is properly classified by our system, which makes the bot metric go up; but also part of the traffic is not properly classified, which makes the human metric go up. If human pageviews go up, but bot pageviews go down (or vice-versa), it could mean a classification issue, or a data issue, not an automated traffic increase. Look at time range(s), affected wikis, countries issuing the requests, and try to get a rough estimate of the magnitude of the increase in traffic. === Mitigating === RDS will be included in the tracking ticket through the above template, and will assign an analyst to help us find a pattern in the misclassified bot traffic so it can be isolated. If we find a pattern that we can use to isolate the misclassified bot traffic, we should file a task to implement the pattern into our automated traffic detection pipeline, and test the results alongside the analyst point of contact. In order to simplify mitigation, the most likely outcome of this step will be to make updates recommended by the analyst to our list of known-bot fingerprints. === Backfilling === After applying bot detection changes, RDS may request that we backfill the data. We have an agreed budget of '''6 weeks''' to complete this operation, to control the disruption it may cause to other Data Engineering priorities. # '''File a ticket to track the backfill itself''' ([[phab:T421735|example]]). # '''Identify which DAGs need to be rerun.''' Consider DAG dependencies: usually, if we rerun a DAG, we also have to rerun all of its dependencies, recursively. # '''Make sure all affected DAGs use ExternalTaskSensors to sense for their parents'''. When rerunning a DAG with ExternalTaskSensors, it waits for their parents to have finished before starting any processing that could result in corrupted data. This does not work well with HivePartitionSensors or other data sensors. Modify the affected DAGs if necessary. # '''Identify the exact date ranges each DAG needs to be rerun for.''' Note DAGs can be hourly, daily, weekly and monthly, and they will probably need different rerun start/stop times. Some DAGs are windowed, i.e. look 24h backwards, calculate parent DAGs rerun dates accordingly. # '''Consider making a backup copy of the original data at the top of the pipeline tree.''' For instance, if the affected metrics are pageviews and unique devices, you could backup webrequest data. This way, if the rerun fails for any reason, we would be able to recover the previous state of the data. # '''Use the [[gitlab:-/snippets/260|rerun script]]''' (or improve it to better fit your needs!) to clear the affected Airflow dag_runs robustly and at a pace sustainable for the cluster. The order in which you specify the DAGs to rerun is important. The script only reruns DAGs for 1 granularity at a time. You should rerun hourly DAGs first, then daily, then weekly and then monthly. Make sure you only start rerunning the monthly ones when the corresponding source data for the whole month has been previously rerun. You can run the script from the Kubernetes Airflow command line. # '''Vet the data as the reruns are happening'''. # '''Keep WMF analysts informed'''. Leave weekly updates to <code>#working-with-data</code> on Slack (RDS is responsible for communicating to other groups.) bzsv7zgpaz9q0o1oq4kjjblqoforjyx 2414260 2414259 2026-05-15T14:00:52Z GGoncalves-WMF 42848 Remove "Response" heading from ticket template. 2414260 wikitext text/x-wiki This page is a guide for Data Engineering folks who are dealing with a data issue potentially related to misclassified automated traffic. Read this if... * ...you receive an alert from Airflow's <code>pageview_human_bot_daily</code> DAG, telling you that the proportion of human traffic has suddenly increased. * ...you are notified by an analyst or a Community member that the pageviews and unique devices metrics for a given set of wikis are seeing a suspicious spike since 3 weeks ago. === Tracking === Before continuing, '''file an [https://phabricator.wikimedia.org/maniphest/task/edit/form/23/?title=Automated%20traffic%20incident%3A%20%3Cshort%20impact%20summary%3E&tags=Data-Engineering,Movement-Insights,WMF-NDA&subscribers=ahoelzl,GGoncalves-WMF,KZimmerman,JerryWang-WMF,Osefu-WMF&description=This%20task%20tracks%20the%20joint%20investigation%20between%20DPE%20and%20RDS%20on%20a%20report%20of%20suspected%20bot%20traffic.%0A%0A%5B%5D%20%40Ahoelzl%20to%20appoint%20a%20Data%20Engineering%20point%20of%20contact%20%28%22DE%22%29.%0A%5B%5D%20%5BDE%5D%20%5BRule%20out%20infrastructure%20issues%5D%28https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform_Engineering%2FAutomated_Traffic_Incident_Runbook%23Ruling_out_infrastructure_issues%29%20as%20the%20cause.%0A%5B%5D%20%5BDE%5D%20Provide%20an%20%5Binitial%20impact%20assessment%5D%28https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform_Engineering%2FAutomated_Traffic_Incident_Runbook%23Initial_impact_statement%29%3A%20time%20window%20for%20investigation%2C%20%2F%2Frough%2F%2F%20impact%20%28e.g.%20approximately%20%2B200%25%20human%20pageviews%29%2C%20impacted%20projects.%0A%5B%5D%20%40OSefu-WMF%20to%20confirm%20whether%20to%20continue%20a%20response%2C%20and%20appoint%20an%20RDS%20point%20of%20contact%20%28%22RDS%22%29.%0A%5B%5D%20%5BRDS%5D%20File%20an%20FYI%20L3SC%20ticket%20to%20Legal%20about%20extending%20webrequest%20data%20retention%20%28see%20%5Bprevious%20approval%5D%28https%3A%2F%2Fapp.asana.com%2F1%2F3758245663860%2Ftask%2F1213320344653123%2Fcomment%2F1213427400582171%3Ffocus%3Dtrue%29%29.%0A%5B%5D%20%5BDE%5D%20Pause%20deletion%20of%20data%20older%20than%2090%20days%20from%20%60webrequest%60.%0A%5B%5D%20%5BRDS%5D%20%5B%5B%20https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform%2FSystems%2FDashiki%2FConfiguration%20%7C%20Annotate%20Wikistats%20%5D%5D%20and%20the%20%5B%5B%20https%3A%2F%2Fsuperset.wikimedia.org%2Fsuperset%2Fdashboard%2F723%2F%3Fnative_filters_key%3DwAJ12f6PFeGWnvW0alXtqoHQ8AW2tKKBX3uPZTBCtEJK2xjFLTluXAS-LXa7Axrb%20%7C%20Movement%20Trends%20dashboard%20%5D%5D%20to%20flag%20the%20in-progress%20investigation.%0A%5B%5D%20%5BRDS%5D%20Characterize%20the%20suspected%20bot%20traffic%3A%20what%20fingerprints%20or%20signals%20correlate%20to%20the%20apparent%20automated%20traffic%3F%0A%5B%5D%20%5BRDS%5D%20Provide%20an%20updated%20impact%20assessment%20for%20the%20chosen%20fingerprint%28s%29%3A%20what%20classification%20changes%20should%20we%20introduce%20for%20them%3F%0A%5B%5D%20%5BDE%5D%20If%20requested%20by%20RDS%2C%20do%20a%20%5Bbackfill%5D%28https%3A%2F%2Fwikitech.wikimedia.org%2Fwiki%2FData_Platform_Engineering%2FAutomated_Traffic_Incident_Runbook%23Backfilling%29%20for%20the%20relevant%20period%20to%20correct%20metrics.%0A%5B%5D%20%5BDE%5D%20Implement%20updates%20to%20bot%20detection%20model%20and%20enable%20them%20in%20production%2C%20with%20optional%20backfill.%0A%5B%5D%20%5BRDS%5D%20Update%20Wikistats%20and%20Movement%20Trends%20annotations%20to%20note%20the%20final%20impact%20and%20correction%2C%20if%20applicable.%0A%5B%5D%20%5BDE%5D%20Re-enable%20%60webrequest%60%2090-day%20expiration.%0A%5B%5D%20%5BDE%5D%20Write%20an%20incident%20report%20and%20file%20tickets%20for%20follow-up%20actions. automated traffic data incident ticket]''' to track the response to this incident. === Responsibilities === Data Engineering is responsible for: * Detecting the incident, ideally via automated alerting. * Providing an initial impact assessment. * Ruling out infrastructure issues as the cause. * Pausing DAGs to allow investigation and backfill. * Mitigating the incident by implementing changes specified by RDS after investigation. * Performing a backfill, if requested by RDS. Our partners at RDS are responsible for: * Investigating anomalous traffic to characterize it for mitigation. * Communicating the impact of the incident to internal and external audiences, as applicable. Use the incident tracking ticket you filed above to coordinate on those actions. === Ruling out infrastructure issues === We're trying to discard the possibility that the suspicious behavior is caused by infrastructure or data processing issues. Check Airflow history for the DAGs that generate the suspicious data. For instance, if the affected data is pageviews and unique devices, you can check: * refine_webrequest_hourly_text * webrequest_actor_metrics_hourly * webrequest_actor_metrics_rollup_hourly * webrequest_actor_label_hourly * pageview_actor_hourly You can look for frequent job failures, sensor timeouts, big variations in the task processing times. If Airflow history is within normality, we can discard an infrastructure or processing issue. === Initial impact statement === Update the tracking ticket with an initial time range, rough estimated impact to metrics and affected projects. You can do this by opening Turnilo's pageviews_daily datasource, and making sure that the suspicious spike follows some of these bot patterns: * Some wikis are affected, others aren't. * Some Countries are affected, some aren't. * Some UserAgents are affected, others aren't. * Both human and bot metrics either stay the same or go up. Usually, when there's a bot traffic spike, part of it is properly classified by our system, which makes the bot metric go up; but also part of the traffic is not properly classified, which makes the human metric go up. If human pageviews go up, but bot pageviews go down (or vice-versa), it could mean a classification issue, or a data issue, not an automated traffic increase. Look at time range(s), affected wikis, countries issuing the requests, and try to get a rough estimate of the magnitude of the increase in traffic. === Mitigating === RDS will be included in the tracking ticket through the above template, and will assign an analyst to help us find a pattern in the misclassified bot traffic so it can be isolated. If we find a pattern that we can use to isolate the misclassified bot traffic, we should file a task to implement the pattern into our automated traffic detection pipeline, and test the results alongside the analyst point of contact. In order to simplify mitigation, the most likely outcome of this step will be to make updates recommended by the analyst to our list of known-bot fingerprints. === Backfilling === After applying bot detection changes, RDS may request that we backfill the data. We have an agreed budget of '''6 weeks''' to complete this operation, to control the disruption it may cause to other Data Engineering priorities. # '''File a ticket to track the backfill itself''' ([[phab:T421735|example]]). # '''Identify which DAGs need to be rerun.''' Consider DAG dependencies: usually, if we rerun a DAG, we also have to rerun all of its dependencies, recursively. # '''Make sure all affected DAGs use ExternalTaskSensors to sense for their parents'''. When rerunning a DAG with ExternalTaskSensors, it waits for their parents to have finished before starting any processing that could result in corrupted data. This does not work well with HivePartitionSensors or other data sensors. Modify the affected DAGs if necessary. # '''Identify the exact date ranges each DAG needs to be rerun for.''' Note DAGs can be hourly, daily, weekly and monthly, and they will probably need different rerun start/stop times. Some DAGs are windowed, i.e. look 24h backwards, calculate parent DAGs rerun dates accordingly. # '''Consider making a backup copy of the original data at the top of the pipeline tree.''' For instance, if the affected metrics are pageviews and unique devices, you could backup webrequest data. This way, if the rerun fails for any reason, we would be able to recover the previous state of the data. # '''Use the [[gitlab:-/snippets/260|rerun script]]''' (or improve it to better fit your needs!) to clear the affected Airflow dag_runs robustly and at a pace sustainable for the cluster. The order in which you specify the DAGs to rerun is important. The script only reruns DAGs for 1 granularity at a time. You should rerun hourly DAGs first, then daily, then weekly and then monthly. Make sure you only start rerunning the monthly ones when the corresponding source data for the whole month has been previously rerun. You can run the script from the Kubernetes Airflow command line. # '''Vet the data as the reruns are happening'''. # '''Keep WMF analysts informed'''. Leave weekly updates to <code>#working-with-data</code> on Slack (RDS is responsible for communicating to other groups.) n0wedgde8z9vn8572ku0a0wccmli79t User:Effie Mouzeli (WMF)/SCROLL/Template 2 460100 2414255 2414246 2026-05-15T12:25:03Z Effie Mouzeli (WMF) 12880 2414255 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: <!-- service name --> = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || <span class="scroll-placeholder">T000000</span> |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | || <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLL reviews]] lw1c68p04eetp76tzaz49tz9vhuj1ep 2414264 2414255 2026-05-15T15:24:01Z Effie Mouzeli (WMF) 12880 2414264 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: <!-- service name --> = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || <span class="scroll-placeholder"><nowiki>[[phab:T000000|]]</nowiki></span> |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | || <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLL reviews]] m1ztvnlrnrhoqu7f2ovw9fxqsop3fl1 2414273 2414264 2026-05-15T15:57:06Z Effie Mouzeli (WMF) 12880 2414273 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: <!-- service name --> = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLL reviews]] 9841atwxoz7l1cb19os1zaia5ff3ynk 2414274 2414273 2026-05-15T15:58:18Z Effie Mouzeli (WMF) 12880 2414274 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: <!-- service name --> = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLLs]] qspqqdoyorze0g5hqpjza8yjhajao1d Deployments/Archive/2026/05 0 460158 2414309 2412180 2026-05-16T02:00:39Z DeploymentCalendarTool 20896 Add last week 2414309 wikitext text/x-wiki ==Week of May 04== ==={{Deployment_day|date=2026-05-03}}=== {{Deployment calendar event card |when=2026-05-03 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-04}}=== {{Deployment calendar event card |when=2026-05-04 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|xxb|xxb}} {{deploy|type=config|gerrit=1279477|title=nlwiki: Modify autoconfirmed requirements for nlwiki|status=}} - {{phabricator|T424898}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-04 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-04 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=1.46.0-wmf.26|gerrit=1281501|title=Email confirmation banner: Remove obsolete arm_b variant|status=}} - {{phabricator|T421366}} {{deploy|type=1.46.0-wmf.26|gerrit=1281504|title=Use js promise for email confirmation banner|status=}} - {{phabricator|T420007}} {{ircnick|nya_1F616EMO|1F616EMO}} {{deploy|type=config|gerrit=1281965|title=zhwikinews: (1/2) revert 20th anniversary logo change (config)|status=}} - {{phabricator|T420165}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-04 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-04 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-04 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-04 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-04 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|toyofuku|toyofuku}} {{deploy|type=config|gerrit=1277667|title=Enable the reading list beta feature survey on all wikipedias|status=}} - {{phabricator|T421776}} {{ircnick|manfredi|manfredi}} {{deploy|type=1.46.0-wmf.26|gerrit=1281501|title=Email confirmation banner: Remove obsolete arm_b variant|status=}} - {{phabricator|T421366}} {{deploy|type=1.46.0-wmf.26|gerrit=1282385|title=Revert^2 "Use js promise for email confirmation banner"|status=}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1276432|title=Enable Hebrew keyboard DWIM for namespace resolution on hewikis|status=}} - {{phabricator|T412468}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-04 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-04 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-04 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.1</code> }} {{Deployment calendar event card |when=2026-05-04 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.1</code> to testwikis }} {{Deployment calendar event card |when=2026-05-04 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-04 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-04 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-05}}=== {{Deployment calendar event card |when=2026-05-05 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|nya_1F616EMO|1F616EMO}} {{deploy|type=config|gerrit=1281967|title=zhwikinews: (2/2) revert 20th anniversary logo change (assets)|status=done}} - {{phabricator|T420165}} {{ircnick|nemo-yiannis|nemo-yiannis}} {{deploy|type=1.47.0-wmf.1|gerrit=1282723|title=Errors added below ref list dirty when not responsive|status=done}} - {{phabricator|T384599}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-05 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-05 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-05 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|MatmaRex|Bartosz}} {{deploy|type=config|gerrit=1270882|title=Remove temporary `wgOAuth2UsePrefixedSub` feature flag|status=}} - {{phabricator|T417690}} {{deploy|type=config|gerrit=1271969|title=Move privileged global and local group handling to WikimediaCustomizations|status=}} - {{phabricator|T418507}} {{ircnick|Msz2001|MSzwarc-WMF}} {{deploy|type=config|gerrit=1282850|title=Switch 'autoconfirmed' to use APCOND_AGE_FROM_EDIT on certain wikis|status=}} - {{phabricator|T418484}} {{ircnick|Jhs|Jon Harald Sรธby}} {{deploy|type=config|gerrit=1281964|title=Add Akan (ak) to wmgExtraLanguageNames by default|status=}} - {{phabricator|T333765}} {{phabricator|T425256}} {{ircnick|jakob_WMDE|jakob_WMDE}} {{deploy|type=1.47.0-wmf.1|gerrit=1282931|title=Fix LemmaLanguageField after core change|status=}} {{ircnick|HakanIST|HakanIST}} {{deploy|type=1.46.0-wmf.26|gerrit=1282397|title=sectionCollapsing: Scroll to fragment target on init|status=}} - {{phabricator|T425290}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-05 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-05 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-05 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-05 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-05 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-05 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jeena|Jeena}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.46.0-wmf.26->1.47.0-wmf.1|1.46.0-wmf.26|1.46.0-wmf.26}} * group0 to [[mw:MediaWiki_1.47/wmf.1|1.47.0-wmf.1]] * '''Blockers: {{phabricator|T423910}}''' }} {{Deployment calendar event card |when=2026-05-05 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=1.46.0-wmf.26|gerrit=1281501|title=Email confirmation banner: Remove obsolete arm_b variant|status=}} - {{phabricator|T421366}} {{ircnick|arlolra|Arlolra}} {{deploy|type=1.46.0-wmf.26|gerrit=1282804|title=Errors added below ref list dirty when not responsive|status=}} - {{phabricator|T384599}} {{ircnick|AaronSchulz|AaronSchulz}} {{deploy|type=config|gerrit=1276814|title=Add wikibase.v1 module to the sandbox were it is present|status=}} - {{phabricator|T422403}} {{ircnick|Mpostoronca|Mpostoronca}} {{deploy|type=1.46.0-wmf.26|gerrit=1282930|title=hCaptcha: Add diagnostic context to script load error logs|status=}} - {{phabricator|T424496}} {{ircnick|HakanIST|HakanIST}} {{deploy|type=1.46.0-wmf.26|gerrit=1282397|title=sectionCollapsing: Scroll to fragment target on init|status=}} - {{phabricator|T425290}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1283082|title=Enable WikiLove on shwiki|status=}} - {{phabricator|T424891}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-05 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-05 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-06}}=== {{Deployment calendar event card |when=2026-05-06 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|awight|Adam Wight}} {{deploy|type=1.46.0-wmf.26|gerrit=1283033|title=VE: Avoid counting all refs when listIndex is undefined|status=}} - {{phabricator|T425433}} {{ircnick|WMDE-Fisch|WMDE-Fisch}} {{deploy|type=1.47.0-wmf.1|gerrit=1283101|title=VE: Avoid counting all refs when listIndex is undefined|status=}} - {{phabricator|T425433}} {{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1283037|title=search: fix alt. completion indices to test keyword tokenizer|status=}} - {{phabricator|T420427}} {{deploy|type=config|gerrit=1283041|title=search: enable Latin-to-Devanagari transliteration second-chance|status=}} - {{phabricator|T425018}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-06 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-06 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-06 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|alexsanford|alexsanford}} {{deploy|type=1.47.0-wmf.1|gerrit=1283028|title=Add messages related to mandatory 2FA for more groups|status=}} - {{phabricator|T423119}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.26|gerrit=1283050|title=Add user_groups to editAttemptStep schema|status=}} - {{phabricator|T424010}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-06 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-06 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-06 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-06 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jeena|Jeena}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1|1.46.0-wmf.26->1.47.0-wmf.1|1.46.0-wmf.26}} * group1 to [[mw:MediaWiki_1.47/wmf.1|1.47.0-wmf.1]] * '''Blockers: {{phabricator|T423910}}''' }} {{Deployment calendar event card |when=2026-05-06 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|SomeRandomDev|SomeRandomDeveloper}} {{deploy|type=config|gerrit=1281526|title=Replace use of $wgRequest|status=}} - {{phabricator|T336703}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-06 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-06 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-06 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-06 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-07}}=== {{Deployment calendar event card |when=2026-05-07 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1269465|title=search: add alt. completion indices to test keyword tokenizer (2/2)|status=d}} - {{phabricator|T420427}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-07 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-07 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-07 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Tran|Tran}} {{deploy|type=config|gerrit=1284553|title=Enable staggered rollout for IRS on enwiki|status=}} - {{phabricator|T424008}} {{deploy|type=1.47.0-wmf.1|gerrit=1284569|title=Fix when user is considered exposed to the feature in the experiment|status=}} - {{phabricator|T424075}} {{ircnick|James_F|James_F}} {{deploy|type=1.47.0-wmf.1|gerrit=1284547|title=Remove the progress bar|status=d}} {{deploy|type=config|gerrit=1275467|title=mc: Set server, instead of host and port, for wgWikiLambdaObjectCaches|status=d}} - {{phabricator|T423311}} {{phabricator|T423626}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-07 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-07 08:00 SF |length=1 |window=Train log triage |who={{ircnick|brennen|Brennen}}, {{ircnick|jeena|Jeena}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-07 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-07 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-07 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-07 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jeena|Jeena}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1|1.47.0-wmf.1|1.46.0-wmf.26->1.47.0-wmf.1}} * group2 to [[mw:MediaWiki_1.47/wmf.1|1.47.0-wmf.1]] * '''Blockers: {{phabricator|T423910}}''' }} {{Deployment calendar event card |when=2026-05-07 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|arlolra|Arlolra}} {{deploy|type=1.47.0-wmf.1|gerrit=1284687|title=Provide page context for LintErrorChecker|status=}} - {{phabricator|T419596}} {{ircnick|kemayo|David L}} {{deploy|type=config|gerrit=1284575|title=Revert "Enable mobile editor abandonment survey on enwiki"|status=}} - {{phabricator|T424102}} {{deploy|type=1.47.0-wmf.1|gerrit=1284702|title=Remove duplicate definition of EditCheckAction#isTagged|status=}} - {{phabricator|T425583}} {{deploy|type=1.47.0-wmf.1|gerrit=1284703|title=Save action filtering info in ContentBranchNodeCheck#onDocumentChange|status=}} - {{phabricator|T425583}} {{ircnick|manfredi|manfredi}} {{deploy|type=1.47.0-wmf.1|gerrit=1284771|title=Make email confirmation banner a standalone RL module|status=}} - {{phabricator|T425677}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=1.47.0-wmf.1|gerrit=1284828|title=vendor: Update webonyx/graphql-php to ^15.32.3|status=}} {{deploy|type=1.47.0-wmf.1|gerrit=1284834|title=composer.json: Update webonyx/graphql-php to ^15.32.3|status=}} {{deploy|type=1.47.0-wmf.1|gerrit=1284832|title=vendor: Bump wikimedia/parsoid to 0.24.0-a2|status=}} - {{phabricator|T425731}} {{deploy|type=1.47.0-wmf.1|gerrit=1284837|title=Bump wikimedia/parsoid to 0.24.0-a2|status=}} - {{phabricator|T425731}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-07 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-07 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-08}}=== {{Deployment calendar event card |when=2026-05-08 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-08 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-09}}=== {{Deployment calendar event card |when=2026-05-09 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of May 11== ==={{Deployment_day|date=2026-05-10}}=== {{Deployment calendar event card |when=2026-05-10 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-05-11}}=== {{Deployment calendar event card |when=2026-05-11 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1278704|title=WikiLambdaApi: update stream configuration|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285352|title=WikiLambdaApi instrument: Sets the custom schemaID|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285406|title=editSaves: getExperiment returns a promise now|status=}} - {{phabricator|T425785}} {{ircnick|dyepezg|Daniel Yepez Garces}} {{deploy|type=config|gerrit=1283048|title=Enabling RSS extension for cowikimedia chapter|status=}} - {{phabricator|T425440}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|yerdua_wmde|yerdua_wmde}} {{deploy|type=config|gerrit=1270482|title=Enable and configure WikiProjects prototype on WikiData beta|status=}} - {{phabricator|T421850}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1284900|title=Completely disable MediaWiki page patrolling functions on German Wikipedia|status=}} - {{phabricator|T316393}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.47.0-wmf.1|gerrit=1285460|title=Prevent username registration if the username previously existed|status=}} - {{phabricator|T196386}} {{deploy|type=1.47.0-wmf.1|gerrit=1285461|title=Prevent username registration if the username previously existed (v2)|status=}} - {{phabricator|T196386}} {{deploy|type=config|gerrit=1285448|title=Grant 'createpreviouslyrenamedaccount' to account creators and sysop-likes|status=}} - {{phabricator|T196386}} {{deploy|type=1.47.0-wmf.1|gerrit=1285462|title=API: Introduce list=globalusers|status=}} - {{phabricator|T261752}} {{deploy|type=1.47.0-wmf.1|gerrit=1285761|title=list=globalusers: Avoid querying group permissions with empty group list|status=}} - {{phabricator|T425859}} {{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1278704|title=WikiLambdaApi: update stream configuration|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285352|title=WikiLambdaApi instrument: Sets the custom schemaID|status=}} - {{phabricator|T415254}} {{deploy|type=1.47.0-wmf.1|gerrit=1285406|title=editSaves: getExperiment returns a promise now|status=}} - {{phabricator|T425785}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-11 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-05-11 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-05-11 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=1.47.0-wmf.1|gerrit=1285743|title=loggedOutWarning: set lastEditor used earlier|status=}} - {{phabricator|T425604}} {{ircnick|jan_drewniak|Jan Drewniak}} * {{gerrit|1285848}} [config] Portal banner deploy {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-11 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-05-11 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-11 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.47.0-wmf.2</code> }} {{Deployment calendar event card |when=2026-05-11 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only โ€“ see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.47.0-wmf.2</code> to testwikis }} {{Deployment calendar event card |when=2026-05-11 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-05-11 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-11 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-12}}=== {{Deployment calendar event card |when=2026-05-12 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1284628|title=cirrus: use a keywork tokenizer for the plain field for autocomplete|status=}} - {{phabricator|T420427}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1|1.47.0-wmf.1}} * group0 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-12 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-12 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-12 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1286334|title=ArticleGuidance: set sparql endpoint|status=}} - {{phabricator|T425389}} {{ircnick|yerdua_wmde|yerdua_wmde}} {{deploy|type=1.47.0-wmf.2|gerrit=1286336|title=Keep all long, non-wrapping values inside parent element|status=}} - {{phabricator|T425176}} {{ircnick|ottomata|ottomata}} {{deploy|type=1.47.0-wmf.2|gerrit=1286341|title=page_change - add revision.revert info|status=}} {{ircnick|atsukoito|atsukoito}} {{deploy|type=config|gerrit=1283711|title=translate: add opensearch-ttmserver-test|status=}} - {{phabricator|T425377}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-05-12 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-12 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-05-12 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-12 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-12 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1|1.47.0-wmf.1}} * group0 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-12 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|alexsanford|alexsanford}} {{deploy|type=config|gerrit=1285905|title=Enforce 2FA requirements for phase 2 groups|status=}} - {{phabricator|T423119}} {{deploy|type=config|gerrit=1286469|title=Prepare $wgOATH2FARequiredGroupRemovalPages for phases 2 and 3|status=}} - {{phabricator|T423119}} {{phabricator|T423120}} {{ircnick|dbrant|Dmitry}} {{deploy|type=config|gerrit=1285930|title=docroot: Add "get_login_creds" permission to Android app.|status=}} - {{phabricator|T426010}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1285482|title=Allow svwiki bureaucrats to remove sysop rights|status=}} - {{phabricator|T425806}} {{ircnick|VadymTS1|VadymTS1}} {{deploy|type=config|gerrit=1283048|title=Enabling RSS extension for cowikimedia chapter|status=}} - {{phabricator|T425440}} {{deploy|type=config|gerrit=1286390|title=Set $wgSignatureAllowedLintErrors to an empty array on Spanish Wiktionary|status=}} - {{phabricator|T425332}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=1.47.0-wmf.2|gerrit=1286484|title=Bump wikimedia/parsoid to 0.24.0-a3|status=}} - {{phabricator|T425981}} {{deploy|type=1.47.0-wmf.2|gerrit=1286485|title=Bump wikimedia/parsoid to 0.24.0-a3|status=}} - {{phabricator|T425981}} {{deploy|type=1.47.0-wmf.2|gerrit=1286488|title=Disable unit tests that fail with new vendor release|status=}} {{deploy|type=1.47.0-wmf.2|gerrit=1286489|title=Skip ContentHolderTest that fails with new vendor release|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-12 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-12 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-13}}=== {{Deployment calendar event card |when=2026-05-13 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|atsukoito|atsukoito}} {{deploy|type=config|gerrit=1286371|title=translate: add opensearch-ttmserver-test|status=}} - {{phabricator|T425377}} {{ircnick|WMDE-Fisch|WMDE-Fisch}} {{deploy|type=config|gerrit=1286400|title=testwiki: Disable sub-ref's synthetic list defined refs on test wikis|status=}} - {{phabricator|T425967}} {{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1286277|title=Revert^2 "cirrus: use a keywork tokenizer for the plain field for autocomplete"|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1}} * group1 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-13 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 04:00 SF |length=1 |window=[[mw:Services|Services]] โ€“ [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-05-13 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=1.47.0-wmf.1|gerrit=1286359|title=Add configurable user-agent and sparql endpoint url|status=}} - {{phabricator|T425389}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1284900|title=Completely disable MediaWiki page patrolling functions on German Wikipedia|status=}} - {{phabricator|T316393}} {{ircnick|mfossati|mfossati}} {{deploy|type=1.47.0-wmf.2|gerrit=1286518|title=[Share Highlight] Exclude section edit links, footnotes from selection|status=}} - {{phabricator|T423658}} {{deploy|type=1.47.0-wmf.2|gerrit=1286838|title=Add robust color fallbacks for QuoteCard average-color styling|status=}} - {{phabricator|T425358}} {{deploy|type=1.47.0-wmf.2|gerrit=1286839|title=Fixed card width|status=}} - {{phabricator|T425710}} {{deploy|type=1.47.0-wmf.2|gerrit=1286844|title=Adjust image size to match fixed width|status=}} - {{phabricator|T425710}} {{deploy|type=1.47.0-wmf.2|gerrit=1286846|title=ShareHighlight: exclude browsers that don't support CSS has|status=}} - {{phabricator|T424873}} {{deploy|type=1.47.0-wmf.2|gerrit=1286847|title=Also skip instrumentation for unsupported browsers|status=}} - {{phabricator|T424873}} {{ircnick|Dragoniez|Dragoniez}} {{deploy|type=1.47.0-wmf.2|gerrit=1286890|title=ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries|status=}} - {{phabricator|T426033}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.47.0-wmf.1|gerrit=1286897|title=ApiQueryGlobalUsers: Fix parsing logic for legacy log_params entries|status=}} - {{phabricator|T426033}} {{deploy|type=1.47.0-wmf.1|gerrit=1286891|title=Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders|status=}} - {{phabricator|T425972}} {{deploy|type=1.47.0-wmf.2|gerrit=1286892|title=Add 'Promise-Non-Write-API-Action' to $wgAllowedCorsHeaders|status=}} - {{phabricator|T425972}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.47.0-wmf.2|gerrit=1286917|title=WikiEditor: Populate user_groups in EditAttemptStep events|status=}} - {{phabricator|T424010}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-13 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-13 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2|1.47.0-wmf.1}} * group1 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-13 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|bpirkle|bpirkle}} {{deploy|type=config|gerrit=1286981|title=Revert "Add wikibase.v1 module to the sandbox were it is present"|status=}} - {{phabricator|T422403}} {{ircnick|ebernhardson|Erik B}} {{deploy|type=config|gerrit=1286997|title=Revert "cirrus: AB test query suggester variants"|status=}} - {{phabricator|T407432}} {{ircnick|Jdlrobson|Jdlrobson}} {{deploy|type=config|gerrit=1287006|title=Update small size for Swedish Wikipedia|status=}} - {{phabricator|T424910}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-13 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-05-13 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-05-13 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-13 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-05-14}}=== {{Deployment calendar event card |when=2026-05-14 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2}} * group2 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-14 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-14 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-05-14 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|annet|annet}} {{deploy|type=config|gerrit=1285913|title=Add ReadingLists Account Creation CTA campaign|status=}} - {{phabricator|T422169}} {{deploy|type=1.47.0-wmf.2|gerrit=1286327|title=WelcomeSurvey: Respect returnTo for campaigns skipping the survey|status=}} - {{phabricator|T422169}} {{ircnick|Nvdtn19|Nvdtn19}} {{deploy|type=config|gerrit=1216721|title=viwikivoyage: enable relatedarticle and pop-up|status=}} - {{phabricator|T405724}} {{ircnick|Krinkle|Krinkle}} {{deploy|type=config|gerrit=1269442|title=Enable wgTrackMediaRequestProvenance on remaining Wikipedias|status=}} - {{phabricator|T414338}} {{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1287043|title=Enable the Article Guidance experiment on simplewiki|status=}} - {{phabricator|T426278}} {{ircnick|mfossati|mfossati}} {{deploy|type=1.47.0-wmf.2|gerrit=1287363|title=Scale share-highlight card to fit small viewports|status=}} - {{phabricator|T426247}} {{ircnick|phuedx|Sam Smith}} {{deploy|type=1.47.0-wmf.2|gerrit=1287368|title=ext.wikimediaEvents: Add synth-aa-ncs-1 experiment|status=}} - {{phabricator|T419514}} {{ircnick|robertsky|robertsky}} {{deploy|type=config|gerrit=1287367|title=throttle rule for ESEAP Conference 2026 15-18 May 2026|status=}} - {{phabricator|T426295}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-05-14 08:00 SF |length=1 |window=Train log triage |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-05-14 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|Dreamy_Jazz|WBrown (WMF)}} * {{gerrit|1279281}} purge_securepoll: don't exclude private wikis {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-05-14 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-05-14 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-05-14 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.47/Roadmap#Schedule for the deployments|1.47 schedule]] {{DeployOneWeekMini|1.47.0-wmf.2|1.47.0-wmf.2|1.47.0-wmf.1->1.47.0-wmf.2}} * group2 to [[mw:MediaWiki_1.47/wmf.2|1.47.0-wmf.2]] * '''Blockers: {{phabricator|T423911}}''' }} {{Deployment calendar event card |when=2026-05-14 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|JSherman|Jsn.sherman}} {{deploy|type=config|gerrit=1192921|title=Enable AutoModerator on Italian Wikipedia|status=}} - {{phabricator|T405152}} {{deploy|type=config|gerrit=1286974|title=Enable AutoModerator on Albanian Wikipedia|status=}} - {{phabricator|T420450}} {{deploy|type=config|gerrit=1286975|title=Enable AutoModerator on Dutch Wikipedia|status=}} - {{phabricator|T425509}} {{ircnick|stephanebisson|Stephane Bisson}} {{deploy|type=config|gerrit=1287427|title=Simplewiki: include article wizard in AG experiment|status=}} - {{phabricator|T426278}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1287433|title=Restrict the changetags user right to bots and sysops on mediawiki.org|status=}} - {{phabricator|T355445}} {{ircnick|Neriah|Neriah}} {{deploy|type=config|gerrit=1287002|title=Disable wgNewUserMessageOnAutoCreate on all WMF wikis|status=}} - {{phabricator|T426206}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-05-14 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start {{ircnick|jan_drewniak|Jan Drewniak}} {{deploy|type=config|gerrit=1287485|title=Disable Reading Lists survey for Wikipedias|status=}} - {{phabricator|T421776}} }} {{Deployment calendar event card |when=2026-05-14 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-05-15}}=== {{Deployment calendar event card |when=2026-05-15 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-05-15 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-05-16}}=== {{Deployment calendar event card |when=2026-05-16 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} 8je1ws9ycx92yff9nhgp52webk0g9ds Annual Toolforge Survey/Ops 0 460159 2414310 2412846 2026-05-16T03:51:08Z JJMC89 7474 cleanup 2414310 wikitext text/x-wiki == Steps == # Create a phabricator task for the survey (such that there's one task per year. eg. [[:phab:T411421]]). # Review the survey questions and make any necessary updates # Draft and review the message body for the survey announcement(see message templates below). # Generate list of users to survey(see instructions below) # Send out announcement email of the survey to list of users. # One week later send a reminder with "[Reminder]" prepended to subject. # About a day before the survey ends remind with "[Final reminder]" prepended to the subject. == Announcement template == <pre> SUBJECT: Participate in the Wikimedia Cloud Services Annual Survey (2025) Hello! The survey known as Cloud Services Annual survey is now part of the overall Developer Satisfaction Survey. You might see this same survey in other venues. If you have already taken the survey, then you do not need to take it again. You are receiving this email because you are a member of the Toolforge project or you are a project admin for a Wikimedia Cloud VPS project. We, the Wikimedia Cloud Services team, are conducting this survey to learn more about how developers use our services and how they can be improved to address their needs. This survey will take between 10 minutes and an hour of your time, depending on how much feedback you would like to share with us in the form of free form comments. This survey will be conducted via a third-party service, which may subject it to additional terms. For more information on privacy and data-handling, see the survey privacy statement [0]. If you agree to these terms and conditions, please go to https://wikimediafoundation.limesurvey.net/552643 to participate in the survey. The survey will end on Monday, 5th January, 2026. If you do not wish to receive future Cloud Services annual survey emails, add your Wikimedia login (i.e. your LDAP username) to the opt-out list [1]. [0]: https://foundation.wikimedia.org/wiki/Legal:Developer_Satisfaction_Survey_2025_Privacy_Statement [1]: https://wikitech.wikimedia.org/wiki/Annual_Toolforge_Survey/Opt_out </pre> == Reminder template == <pre> SUBJECT: REMINDER: Participate in the Cloud Services Survey (2025) Hello! Reminder, the Wikimedia Cloud Services team is conducting this survey to learn more about how developers use our services and how they can be improved to address their needs. We would really appreciate your feedback. Please note that, we are conducting the survey together with Developer Satisfaction Survey and you might see this same survey in other venues. If you have already taken the survey, then you do not need to take it again. The survey will end on Monday, 5th January, 2026. If you have already filled out the survey, thank you! * Privacy policy: https://foundation.wikimedia.org/wiki/Legal:Developer_Satisfaction_Survey_2025_Privacy_Statement * Survey: https://wikimediafoundation.limesurvey.net/552643 * Opt out: https://wikitech.wikimedia.org/wiki/Annual_Toolforge_Survey/Opt_out </pre> == Bulk emailing instructions == Tools for generating the survey list can be found at [[:gitlab:repos/cloud/wmcs/cloud-survey-ops]]. See discussions here [[:phab:T411545]] for context. The scripts should be run from any of the Toolforge `bastion` hosts. Make a list of email addresses suitable for surveying all Toolforge maintainers + Cloud VPS project admins. <syntaxhighlight lang="shell-session"> $ ldapsearch -x cn=project-tools | grep "member" | cut -d "=" -f2 | cut -d "," -f1 |xargs -I memberid ldapsearch -x uid=memberid | grep "mail:" | cut -d ":" -f2 > users.txt # Generates list of Toolforge users $ python3 make-cloudvps-email-list.py >> users.txt # Generate and add CloudVPS admins to the list $ sort -u users.txt > all-users-sorted.txt # Sort the list and remove duplicates </syntaxhighlight > Once the list of recipients is complete, send emails using [[Annual Toolforge Survey/Ops/SendEmailViaToolforge.py]]. Be sure to specify <code>--optout "[[Annual Toolforge Survey/Opt out]]"</code>. Note that some users might reach out via email asking to opt out of receiving the survey. You will have to manually add them to the opt out list on the wiki page yourself. 9526r542iw0gnn5jr2kqu3whg0hkp2i SCROLL/Ducks 0 460177 2414256 2026-05-15T12:26:07Z Effie Mouzeli (WMF) 12880 Created page with "{{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: Duck = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service i..." 2414256 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: Duck = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || <span class="scroll-placeholder">T000000</span> |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | || <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLL reviews]] kv7t8rszizt6di5uh4a3ywpqiqwdyrl SCROLL/Duck 0 460178 2414263 2026-05-15T15:12:53Z Effie Mouzeli (WMF) 12880 Created page with "{{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: Duck = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service i..." 2414263 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: Duck = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || <span class="scroll-placeholder">T000000</span> |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | || <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLL reviews]] kv7t8rszizt6di5uh4a3ywpqiqwdyrl 2414271 2414263 2026-05-15T15:46:57Z Effie Mouzeli (WMF) 12880 2414271 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: Duck = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T327319}} || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T1}} || <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLL reviews]] 6jouao4gyffntrjjinvt9otn5t10z0o 2414272 2414271 2026-05-15T15:47:49Z Effie Mouzeli (WMF) 12880 /* SCROLL: Duck */ 2414272 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: Duck = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T424357}} |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T327319}} || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T424357}}|| <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLL reviews]] re08rydsd0twv495okel4sz19361g3i 2414276 2414272 2026-05-15T15:59:11Z Effie Mouzeli (WMF) 12880 2414276 wikitext text/x-wiki {{Draft}} <templatestyles src="User:Effie_Mouzeli_(WMF)/SCROLL/styles.css"/> = SCROLL: Duck = ''Service Checklist for Readiness, Operations, Launch and Lifecycle'' {| style="width: 100%; border: none; background: transparent;" | style="vertical-align: top; width: 38%; padding-right: 2%;" | <!-- LEFT: service identity card --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | ๐Ÿ“œ Service identity |- | class="scroll-card-label" | '''Service''' || <span class="scroll-placeholder">service name</span> |- | class="scroll-card-label" | '''Owner''' || <span class="scroll-placeholder">team name</span> |- | class="scroll-card-label" | '''SCROLL bearer''' || <span class="scroll-placeholder">@sre-reviewer</span> |- | class="scroll-card-label" | '''Soft Launch Target''' (some users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''Full Launch Target''' (all users) || <span class="scroll-placeholder">YYYY-MM-DD</span> |- | class="scroll-card-label" | '''SCROLL epic''' || {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T424357}} |} | style="vertical-align: top; width: 60%;" | <!-- RIGHT: at-a-glance --> {| class="wikitable" style="font-size: 90%; width: 100%; border: 1px solid #a2a9b1;" |- ! colspan="2" class="scroll-card-header" | At a glance |- | class="scroll-card-label" | '''Type of Request''' || <span class="scroll-placeholder">service / extension / core feature / feature</span> |- | class="scroll-card-label" | '''Phabricator tags''' || <span class="scroll-placeholder">#tagname</span> |- | class="scroll-card-label" | '''Service Ownership and Contact Information''' || <span class="scroll-placeholder">link to team page</span> |- | class="scroll-card-label" | '''Repository URL''' || <span class="scroll-placeholder">repo URL</span> |- | class="scroll-card-label" | '''Wikitech Page URL''' || <span class="scroll-placeholder">wikitech page</span> |- | class="scroll-card-label" | '''Google Drive URL''' || <span class="scroll-placeholder">Drive URL (if applicable)</span> |- | class="scroll-card-label" | '''Design Document''' || <span class="scroll-placeholder">link</span> |- | class="scroll-card-label" | '''Service Health Dashboards''' || <span class="scroll-placeholder">Grafana link</span> |- | class="scroll-card-label" | '''Technical Runbook''' || <span class="scroll-placeholder">wikitech page</span> |} |} {| class="scroll-legend" |- | '''Priority:''' &nbsp; ๐Ÿš€ Required for soft launch &nbsp;ยท&nbsp; ๐Ÿ’ฏ Required for full launch &nbsp;ยท&nbsp; โ“ Needs scoping / may not be applicable |- | '''Required for:''' &nbsp; โš™๏ธ Service &nbsp;ยท&nbsp; ๐Ÿงฉ Extension &nbsp;ยท&nbsp; ๐ŸŒป Core Feature &nbsp;ยท&nbsp; โœจ Feature |} == 1. Service Summary == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Documentation |- | style="text-align: center;" | '''1.0''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Wikitech page? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} || <span class="scroll-remarks">Wikitech page (Template will be provided soon)</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is your component present on the Service Catalogue? || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T327319}} || <span class="scroll-remarks">The Service Catalogue is the canonical inventory of WMF services. Component should have an entry there.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Contacting The Team |- | style="text-align: center;" | '''1.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Related Phabricator Tags || style="text-align: center;" | {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T424357}}|| <span class="scroll-remarks">List the Phabricator project tags associated with this component. This routes bug reports and tasks to the right team</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are the team's contact details documented on the Wikitech page and verified in officewiki? || style="text-align: center;" | || <span class="scroll-remarks">Ensure that contact info and team structure is up to date</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | External Reviews |- | style="text-align: center;" | '''1.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by SRE? || style="text-align: center;" | || <span class="scroll-remarks">Schedule a meeting with SRE early on, both to agree target dates (soft launch, full launch) and to walk through the checklist together so you can confirm which items are relevant to your service and which can be skipped</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Security? || style="text-align: center;" | || <span class="scroll-remarks">Security team is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the design been reviewed by Data Persistence? || style="text-align: center;" | || <span class="scroll-remarks">Data Persistence is aware of this work and has communicated their requirements (if applicable).</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''1.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Has the service/feature been reviewed by the SLO working group? || style="text-align: center;" | || <span class="scroll-remarks">The SLO working group is aware of this work and has communicated their requirements (if applicable)</span> || style="text-align: center;" | || style="text-align: center;" | || style="text-align: center;" | |} == 2. Operating Procedures == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Interactions |- | style="text-align: center;" | '''2.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you described how the service/feature interacts with common mediawiki userflows? || style="text-align: center;" | || <span class="scroll-remarks">Engineers should understand where this component sits on the critical path and be able to assess the impact when something goes wrong.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Infrastructure |- | style="text-align: center;" | '''2.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service in Puppet's ServiceCatalogue (service.yaml) || style="text-align: center;" | || <span class="scroll-remarks">If this is a standalone service accepting traffic, it must exist in service.yaml</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Is your service running on VMs/Baremetal or Kubernetes? || style="text-align: center;" | || <span class="scroll-remarks">If on baremetal/VMs please provide prefixes. If on k8s, please provide the cluster name here.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service use a helm chart? || style="text-align: center;" | || <span class="scroll-remarks">If this is a kubernetes deployment , it must have a helm chart</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a kubernetes service? || style="text-align: center;" | || <span class="scroll-remarks">If this deployment is accepting traffic from outside of kubernetes, it must have a service</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Service URL || style="text-align: center;" | || <span class="scroll-remarks">The URL where this service can be reached.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€โ“ || style="text-align: center;" | โš™๏ธ || Does your service have a staging environment? if yes, please fill in the URL. || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Traffic |- | style="text-align: center;" | '''2.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the traffic your service will be serving? || style="text-align: center;" | || <span class="scroll-remarks">Teams should be able to work out an estimation of what traffic they expect, as well as what methodology was used. If that's not straightforward, please reach out to SRE and we can work through it together.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service accept traffic directly from the CDN? || style="text-align: center;" | || <span class="scroll-remarks">If your service has public endpoints, SRE Traffic may need to provide additional configuration for routing and caching.</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your service have a discovery url? || style="text-align: center;" | || <span class="scroll-remarks">If this service is either active/active or active/passive, it must have a discovery URL</span> || style="text-align: center;" | <span class="scroll-assignee">SRE</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Can the service be depooled safely and run from a single DC? || style="text-align: center;" | || <span class="scroll-remarks">If this is an active/active service, can it tolerate one datacentre being depooled without user-visible impact?</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''2.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified with which systems/datastores your service needs to communicate with? || style="text-align: center;" | || <span class="scroll-remarks">A clear list of dependencies helps with capacity planning as well as monitoring</span> || style="text-align: center;" | <span class="scroll-assignee">TEAm</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Maintenance |- | style="text-align: center;" | '''2.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have dependencies on maintenance scripts (mw-script) or crons (mw-cron), have they been documented and recently tested? || style="text-align: center;" | || <span class="scroll-remarks">Maintenance scripts and crons often go untested for long periods. Documenting and testing them prevents surprises when they fail or need to be re-run.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 3. Release Confidence == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Building and Testing |- | style="text-align: center;" | '''3.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your wikitech page link to the code repository and production branch? || style="text-align: center;" | || <span class="scroll-remarks">Direct links to the repo and the branch running in production make it easy to find the right code</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does your wikitech page document the name and location of the most recent image? || style="text-align: center;" | || <span class="scroll-remarks">The container image name and location on the registry. If the image version is defined in a non non-standard location, this must be documented here.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have integration and unit tests (CI)? || style="text-align: center;" | || <span class="scroll-placeholder">โ€”</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Initial Rollout |- | style="text-align: center;" | '''3.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified common failure points on launch day (soft and full launch)? || style="text-align: center;" | || <span class="scroll-remarks">Knowing the likely failure modes, eg cold caches or an overwhelmed dependency, helps you prepare mitigations for launch day</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a Rollout plan? || style="text-align: center;" | || <span class="scroll-remarks">A documented rollout plan covering the deployment sequence, smoke tests, rollback steps, and communication.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Deploying to Production |- | style="text-align: center;" | '''3.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have deployers in your team? || style="text-align: center;" | || <span class="scroll-remarks">Team members ready and authorised to deploy means you can roll out changes and fixes on your own schedule, rather than waiting for help</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''3.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor your error budget during deployments? || style="text-align: center;" | || <span class="scroll-remarks">Monitoring error budget during a deploy catches regressions early and provides a clear signal for whether to continue or roll back.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 4. Observability and incident response == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Metrics, Instrumentation, Logging |- | style="text-align: center;" | '''4.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are you exporting Prometheus metrics and sending logs to Logstash? || style="text-align: center;" | || <span class="scroll-remarks">Prometheus and Logstash are WMF's standard tools for metrics and logs. Exporting to both is the baseline for any observable service.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are key user flows and business metrics instrumented and exported? || style="text-align: center;" | || <span class="scroll-remarks">Instrumenting user-facing flows and business outcomes helps not only measure what matters to users, but assess impact during incidents</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Service Level Objectives |- | style="text-align: center;" | '''4.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have SLOs been drafted to assist in evaluating the impact on end users? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have the relevant (SLIs) been identified and visualised in Grafana? || style="text-align: center;" | || <span class="scroll-remarks">sample</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Monitoring |- | style="text-align: center;" | '''4.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have (a) grafana dashboard(s)? || style="text-align: center;" | || <span class="scroll-remarks">A Grafana dashboard used by both devs and SREs, clearly showing the component health</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.8''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you have a logstash dashboard || style="text-align: center;" | || <span class="scroll-remarks">A Logstash dashboard surfaces application logs and errors, complementing Grafana metrics</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.9''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Does your dashboard include links to related dashboards, documents, and/or other URLs? || style="text-align: center;" | || <span class="scroll-remarks">Cross-links to related dashboard as well as dependencies, runbooks, and documentation</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.10''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || If you have external dependencies, do you monitor its status? || style="text-align: center;" | || <span class="scroll-remarks">External dependency health (databases, APIs, third-party services) often explains service issues. Try to include panels or links to them</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.11''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Do you monitor latency variations at the p50, p75, and p99 percentiles (eg via envoy, or other business metrics)? || style="text-align: center;" | || <span class="scroll-remarks">Dashboard must include latency metrics.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Alerting |- | style="text-align: center;" | '''4.12''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Have you identified which alerts may need to page on-callers? || style="text-align: center;" | || <span class="scroll-remarks">Identify which alerts should page on-callers, and which should only notify the dev team</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.13''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there alerts for excessive errors (business, infrastructure, budget burn rate)? || style="text-align: center;" | || <span class="scroll-remarks">Alerts on different layers catch different failures</span> || style="text-align: center;" | <span class="scroll-assignee">SRE &amp; Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''4.14''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏโ“ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are your alerts linked to runbooks? || style="text-align: center;" | || <span class="scroll-remarks">If there are alerts, are they linked to the appropriate runbooks and/or dashboards?</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Incident Response |- | style="text-align: center;" | '''4.15''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Can the right people be found promptly when needed to address service issues? || style="text-align: center;" | || <span class="scroll-remarks">Responders should know how to reach the dev team quickly during an incident, with clear escalation paths in place</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} == 5. Reliability and performance == {| class="wikitable sortable" style="font-size: 90%; width: 100%;" |- ! style="width: 4%;" class="scroll-col-header-center" | # !! style="width: 7%;" class="scroll-col-header-center" | Status !! style="width: 8%;" class="scroll-col-header-center" | Priority !! style="width: 10%;" class="scroll-col-header-center" | Required for !! class="scroll-col-header" | Item !! style="width: 7%;" class="scroll-col-header-center" | Phab !! class="scroll-col-header" | Remarks !! style="width: 8%;" class="scroll-col-header-center" | Assignee !! style="width: 8%;" class="scroll-col-header-center" | Signed off by !! style="width: 6%;" class="scroll-col-header-center" | Date |- | colspan="10" class="scroll-section" | Resources |- | style="text-align: center;" | '''5.1''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Do you have an estimation of the resources you will need? || style="text-align: center;" | || <span class="scroll-remarks">Estimated CPU, memory, and storage requirements drive capacity planning.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.2''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ || Is the service designed to scale up or down as needed? || style="text-align: center;" | || <span class="scroll-remarks">SRE should be able to add/remove resources on demand without contacting the team.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | colspan="10" class="scroll-section" | Reliability |- | style="text-align: center;" | '''5.3''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Does each component have its own health or liveness check to ensure production traffic does not reach an unhealthy endpoint? || style="text-align: center;" | || <span class="scroll-remarks">LIveness and readiness checks should be inplace for kubernetes as well as for alerting purposes</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.4''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿš€ || style="text-align: center;" | โš™๏ธ || Have you identified your system SPOFs? || style="text-align: center;" | || <span class="scroll-remarks">Single Points of Failure are components whose loss takes down the service. Identifying them is the first step to mitigating or accepting the risk.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.5''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are there protections to keep the service performing reliably under pressure (rate limiting, load shedding, graceful degradation) || style="text-align: center;" | || <span class="scroll-remarks">Under load, services should degrade gracefully rather than collapse. Patterns like rate limiting, load shedding, and circuit breakers protect both the service and its dependencies.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.6''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Are backoff, retry, and fallback or failover strategies defined for the service and its dependencies? || style="text-align: center;" | || <span class="scroll-remarks">Well-defined retry and fallback behaviour prevents a component from collapsing when dependencies misbehave.</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |- | style="text-align: center;" | '''5.7''' || style="text-align: center;" | || style="text-align: center;" | ๐Ÿ’ฏ || style="text-align: center;" | โš™๏ธ๐Ÿงฉ๐ŸŒปโœจ || Is the Bus Factor for this service or feature at least 2? || style="text-align: center;" | || <span class="scroll-remarks">At least two people should understand the service well enough to ensure its operation and longevity</span> || style="text-align: center;" | <span class="scroll-assignee">Team</span> || style="text-align: center;" | || style="text-align: center;" | |} [[Category:SCROLLs]] a3grryd6ekicsgh70xly0ae1ovkfehl User:Effie Mouzeli (WMF)/SCROLL/phab 2 460179 2414269 2026-05-15T15:46:27Z Effie Mouzeli (WMF) 12880 Created page with "<includeonly>{{#if:{{{1|}}}|[[phab:{{{1}}}|{{{1}}}]]|<span class="scroll-placeholder">T000000</span>}}</includeonly><noinclude> == Description == Renders a Phabricator task link that displays as just the task ID (e.g. T327319). Just for scroll atm If no task ID is supplied, the template renders a grey "T000000" placeholder, so it can be dropped into a preload skeleton as-is. == Usage == <code><nowiki>{{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T327319}}</nowiki></code> re..." 2414269 wikitext text/x-wiki <includeonly>{{#if:{{{1|}}}|[[phab:{{{1}}}|{{{1}}}]]|<span class="scroll-placeholder">T000000</span>}}</includeonly><noinclude> == Description == Renders a Phabricator task link that displays as just the task ID (e.g. T327319). Just for scroll atm If no task ID is supplied, the template renders a grey "T000000" placeholder, so it can be dropped into a preload skeleton as-is. == Usage == <code><nowiki>{{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T327319}}</nowiki></code> renders as: {{User:Effie_Mouzeli_(WMF)/SCROLL/phab|T327319}} <code><nowiki>{{User:Effie_Mouzeli_(WMF)/SCROLL/phab}}</nowiki></code> (no parameter, preload state) renders as: {{User:Effie_Mouzeli_(WMF)/SCROLL/phab}} <templatedata> { "description": "Render a Phabricator task link that displays as just the task ID, e.g. T327319.", "params": { "1": { "label": "Task ID", "description": "Phabricator task identifier including the T prefix.", "example": "T327319", "type": "string", "required": true } }, "format": "inline" } </templatedata> </noinclude> fnkoqus91uphpkbaobp50fkoeqhgte2 Category:SCROLLs 14 460180 2414279 2026-05-15T16:03:45Z Effie Mouzeli (WMF) 12880 Created page with "This category lists all [[User:Effie_Mouzeli_(WMF)/SCROLL|SCROLL]] pages: the production readiness reviews for Wikimedia Foundation services, extensions, and features. A [[User:Effie_Mouzeli_(WMF)/SCROLL/Guide|SCROLL]] (''Service Checklist for Readiness, Operations, Launch and Lifecycle'') is a wiki page per component, structured as a filterable checklist of readiness items grouped into five sections: # Service summary # Operating procedures # Release confidence # Obs..." 2414279 wikitext text/x-wiki This category lists all [[User:Effie_Mouzeli_(WMF)/SCROLL|SCROLL]] pages: the production readiness reviews for Wikimedia Foundation services, extensions, and features. A [[User:Effie_Mouzeli_(WMF)/SCROLL/Guide|SCROLL]] (''Service Checklist for Readiness, Operations, Launch and Lifecycle'') is a wiki page per component, structured as a filterable checklist of readiness items grouped into five sections: # Service summary # Operating procedures # Release confidence # Observability and incident response # Reliability and performance Each item carries a status, a priority, an owning team, and a sign-off. Each SCROLL goes through a Prologue (scoping and alignment), Chapter 1 (soft launch, some users), and Chapter 2 (full launch, all users), with sign-offs at each stage. == See also == * [[User:Effie_Mouzeli_(WMF)/SCROLL|SCROLL landing page]], where you can create a new SCROLL. * [[User:Effie_Mouzeli_(WMF)/SCROLL/Guide|SCROLL guide]], the first-time user guide. * [[User:Effie_Mouzeli_(WMF)/SCROLL/Template|SCROLL template]], the preload skeleton. peqn3ekn0pvaci17yf5xr3ffknjk9qc